Submit a preprint

 

Latest recommendations

IdTitle * Authors * Abstract * Picture * Thematic fields * RecommenderReviewersSubmission date
28 Nov 2025
article picture

Bulk-based hypothesis weighing increases power in single-cell differential expression analysis

Bulk RNA-seq to the rescue of differential expression analysis in single-cell transcriptomics

Recommended by ORCID_LOGO based on reviews by Marcel Schilling and Benedikt Obermayer

Over the past decade, single-cell transcriptomics has become a widely used technology for investigating cell type–specific changes in gene expression. Despite its popularity, the power of differential expression analyses in single-cell data is often limited by inherent technical challenges—such as high levels of missing data, low capture efficiency, and small numbers of cells—as well as by the insufficient number of biological replicates, which results in a lack of power for differential expression analysis (see for instance Squair et al. 2021 & Wu et al. 2025).  

In this work, Germain and colleagues (2025) demonstrate how leveraging existing bulk RNA-seq datasets generated under the same experimental conditions can enhance the power and robustness of single-cell differential expression analyses. They used bulk RNA-seq data to design several hypothesis-grouping strategies informed by the significance of gene expression changes, and paired these strategies with different p-value adjustment methods. Overall, their analyses show that all strategies based on grouped false discovery rate correction, that is, where the genes in the single-cell dataset are grouped according to their significance in bulk and corrected in each group independently, can leverage bulk RNA-seq data to increase the power of single-cell differential expression analyses.

All of these hypothesis-grouping strategies are now implemented in muscat (Crowell et al. 2020), a popular R package that implements multiple tests for differential expression analyses in single-cell data.

               

References

Crowell HL, Soneson C, Germain P-L, Calini D, Collin L, Raposo C, Malhotra D, Robinson MD (2020) muscat detects subpopulation-specific state transitions from multi-sample multi-condition single-cell transcriptomics data. Nature Communications, 11, 6077. https://doi.org/10.1038/s41467-020-19894-4

Germain P-L, Wang J, Robinson MD (2025) Bulk-based hypothesis weighing increases power in single-cell differential expression analysis. bioRxiv, ver. 4 peer-reviewed and recommended by PCI Genomics. https://doi.org/10.1101/2025.04.15.648932

Squair JW, Gautier M, Kathe C, Anderson MA, James ND, Hutson TH, Hudelle R, Qaiser T, Matson KJE, Barraud Q, Levine AJ, La Manno G, Skinnider MA, Courtine G (2021) Confronting false discoveries in single-cell differential expression. Nature Communications, 12, 5692. https://doi.org/10.1038/s41467-021-25960-2

Wu C-H, Zhou X, Chen M (2025) Exploring and mitigating shortcomings in single-cell differential expression analysis with a new statistical paradigm. Genome Biology, 26, 58. https://doi.org/10.1186/s13059-025-03525-6

 

Bulk-based hypothesis weighing increases power in single-cell differential expression analysisPierre-Luc Germain, Jiayi Wang, Mark D. Robinson<p>Due to the costs of single-cell sequencing, sample sizes are often relatively limited, sometimes leading to poorly reproducible results. In many contexts, however, larger bulk RNAseq data is available for the same conditions or experimental par...BioinformaticsMireya Plass2025-04-21 08:37:49 View
17 Nov 2025
article picture

Evidencing strain-dependency of metabolic pathways within 1,494 lactic bacteria genomes with the in silico screening Prolipipe pipeline

Towards an exploration of metabolic pathways at the strain level in bacterial genomes

Recommended by ORCID_LOGO based on reviews by 2 anonymous reviewers

Over the last 15 years, tools for genome-scale metabolic reconstructions of individual microorganisms or small microbial consortia have been extensively developed (Thiele et al. 2010; Machado et al. 2018; Frioux et al. 2020; Bernstein et al. 2021; Zimmermann et al. 2021; Quinn-Bohmann et al. 2025). However, the quality of the results depends heavily on each protocol’s ability to account for different sources of uncertainty, such as the quality of functional (meta)genome annotations and pathway databases, and the precise description of the environment features (pH, temperature, O2 and chemical compounds availability; Zimmermann et al. 2021). Most tools require manual curation (Frioux et al. 2020). Furthermore, several key challenges remain: (i) gaps and incompleteness in genomic and biochemical data for less characterized or uncultured microorganisms; (ii) scalability and computational complexity when modeling large microbial consortia; (iii) strain-resolved metabolic modeling, especially in non-model species and (iv) the development of automated and standardized reconstruction pipelines that maintain model quality and reproducibility (Bernstein et al. 2021). 

Robert et al. (2025) address the specific issue of bacterial community-scale metabolic model reconstruction. They present an original work that aims to assemble metabolic networks from large-scale datasets (typically with more than 1,000 genomes), and focuses on strain-level metabolic capacities. This work relies on a bioinformatics pipeline called Prolipipe that uses standard tools and databases, such as eggNOG (Huerta-Cepas et al., 2019) and MetaCyc (Caspi et al., 2020) databases, Pathway Tools (Karp et al., 2021) and PADMet toolbox (Aite et al., 2018), for functional and metabolic annotation. The pipeline aggregates three draft genome-scale metabolic predictions to create a consensus genome-scale metabolic model per strain, thus facilitating pathway interpretation among strains of a given species.

During the peer-review process, the reviewers particularly appreciated the authors doing the following:

  1. Providing a robust and FAIR pipeline (Wilkinson et al. 2016) to predict metabolic potential on a large scale by focusing on specific metabolic pathways.
  2. Evaluating strain variability in the metabolic potential of targeted pathways in strains of the same species.
  3. Providing new representations for visualizing this kind of analysis.

Robert et al. (2025) present an application of the pipeline to 1,494 lactic acid bacterial genomes, offering interesting insights into the distribution of pathway completion rates within this taxonomic group. By focusing on the L-arginine biosynthesis metabolic pathway, they demonstrate the pipeline's ability to detect species exhibiting intraspecific variability in this pathway. 

In conclusion, this work constitutes a useful in silico decision-support tool for prioritizing strains of interest based on their gene-reaction associations, which has great potential for many other microbial metabolic applications.

                
References

Aite M, Chevallier M, Frioux C, Trottier C, Got J, Cortés MP, Mendoza SN, Carrier G, Dameron O, Guillaudeux N, Latorre M, Loira N, Markov GV, Maass A, Siegel A (2018). Traceability, reproducibility and wiki-exploration for “à-la-carte” reconstructions of genome-scale metabolic models. PLOS Computational Biology 14. https://doi.org/10.1371/journal.pcbi.1006146

Bernstein DB, Sulheim S, Almaas E, Segrè D (2021) Addressing uncertainty in genome-scale metabolic model reconstruction and analysis. Genome Biology, 22, 64. https://doi.org/10.1186/s13059-021-02289-z

Caspi R, Billington R, Keseler IM, Kothari A, Krummenacker M, Midford PE, Ong WK, Paley S, Subhraveti P, Karp PD (2020). The MetaCyc database of metabolic pathways and enzymes - a 2019 update. Nucleic Acids Research 48, D445–D453. https://doi.org/10.1093/nar/Gkz862  

Frioux C, Singh D, Korcsmaros T, Hildebrand F (2020) From bag-of-genes to bag-of-genomes: metabolic modelling of communities in the era of metagenome-assembled genomes. Computational and Structural Biotechnology Journal, 18, 1722–1734. https://doi.org/10.1016/j.csbj.2020.06.028

Huerta-Cepas J, Szklarczyk D, Heller D, Hernández-Plaza A, Forslund SK, Cook H, Mende DR, Letunic I, Rattei T, Jensen LJ, von Mering C, Bork P (2019). eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Research 47, D309–D314. https://doi.org/10.1093/nar/Gky1085   

Karp PD, Midford PE, Billington R, Kothari A, Krummenacker M, Latendresse M, Ong WK, SubhravetiP, Caspi R, Fulcher C, Keseler IM, Paley SM (2021). Pathway Tools version 23.0 update: software for pathway/genome informatics and systems biology. Briefings in Bioinformatics 22,
109–126. https://doi.org/10.1093/bib/bbz104

Machado D, Andrejev S, Tramontano M, Patil KR (2018) Fast automated reconstruction of genome-scale metabolic models for microbial species and communities. Nucleic Acids Research, 46, 7542–7553. https://doi.org/10.1093/nar/gky537

Quinn-Bohmann N, Carr AV, Diener C, Gibbons SM (2025) Moving from genome-scale to community-scale metabolic models for the human gut microbiome. Nature Microbiology, 10, 1055–1066. https://doi.org/10.1038/s41564-025-01972-2

Robert N, Got J, Hamon-Giraud P, Falentin H, Siegel A (2025) Evidencing strain-dependency of metabolic pathways within 1,494 lactic bacteria genomes with the in silico screening Prolipipe pipeline. HAL, ver. 3 peer-reviewed and recommended by PCI Genomics. https://hal.science/hal-05045657v3

Thiele I, Palsson BØ (2010) A protocol for generating a high-quality genome-scale metabolic reconstruction. Nature Protocols, 5, 93–121. https://doi.org/10.1038/nprot.2009.203

Wilkinson MD, Dumontier M, Aalbersberg IjJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten J-W, da Silva Santos LB, Bourne PE, Bouwman J, Brookes AJ, Clark T, Crosas M, Dillo I, Dumon O, Edmunds S, Evelo CT, Finkers R, Gonzalez-Beltran A, Gray AJG, Groth P, Goble C, Grethe JS, Heringa J, ’t Hoen PAC, Hooft R, Kuhn T, Kok R, Kok J, Lusher SJ, Martone ME, Mons A, Packer AL, Persson B, Rocca-Serra P, Roos M, van Schaik R, Sansone S-A, Schultes E, Sengstag T, Slater T, Strawn G, Swertz MA, Thompson M, van der Lei J, van Mulligen E, Velterop J, Waagmeester A, Wittenburg P, Wolstencroft K, Zhao J, Mons B (2016) The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018. https://doi.org/10.1038/sdata.2016.18

Zimmermann J, Kaleta C, Waschina S (2021) gapseq: informed prediction of bacterial metabolic pathways and reconstruction of accurate metabolic models. Genome Biology, 22, 81. https://doi.org/10.1186/s13059-021-02295-1 

Evidencing strain-dependency of metabolic pathways within 1,494 lactic bacteria genomes with the *in silico* screening Prolipipe pipelineNoé Robert, Jeanne Got, Pauline Hamon-Giraud, Hélène Falentin, Anne Siegel<p>Genomes from bacteria of interest to the food industry exhibit significant functional variability, yet evaluating this characteristic remains challenging. As public repositories continue to accumulate more genomes, large-scale assessment of met...Bacteria and archaea, BioinformaticsHélène Chiapello2025-04-25 08:21:44 View
06 Nov 2025
article picture

GrAnnoT, a tool for effecient and reliable annotation transfer through pangenome graph

GrAnnoT: Efficient annotation transfer through pangenome graphs

Recommended by ORCID_LOGO based on reviews by Guillaume Gautreau and 2 anonymous reviewers

Summary

Marthe et al. (2025) present GrAnnoT, a tool for transferring genomic annotations between genomes using pangenome variation graphs. As variation graphs become standard for representing intraspecific diversity, the propagation of annotation between genomes represented in those graphs is essential. GrAnnoT transfers existing annotations from and to linear genomes using a pangenome graph as a medium, instead of calling annotations de novo or doing liftover.


Why This Matters

Variation graphs lack the annotation toolset available for bacterial pangenomes (PPanGGOLiN [Gautreau et al. 2020], ggCaller [Horsfield et al. 2023]) or linear genome transfers (Liftoff [Shumate and Salzberg 2021]). GrAnnoT fills this gap for eukaryotic pangenomics. The tool is fast, conservative in its transfers, and provides useful outputs including presence-absence matrices and variant alignments. Benchmarking against Liftoff, VG (Garrison et al. 2018), ODGI (Guarracino et al. 2022), and GraphAligner (Rautiainen and Marschall 2020) across rice, human, and E. coli datasets demonstrates that GrAnnoT is reliable and efficient for syntenic regions. Code, data, and methods are well documented.


Limitations

Authors transparently acknowledge that non-syntenic elements (transposable elements, interchromosomal translocations) are better handled by Liftoff. The tool is currently validated primarily for minigraph-cactus graphs. Compatibility with PanGenome Graph Builder graphs (Garrison et al. 2024) is limited, and scalability beyond 69 A. thaliana genomes is untested. These limitations are appropriate for a specialized tool and are clearly stated in the revised Discussion section.


Peer Review

Three reviewers provided thorough feedback. The authors substantially revised the manuscript, clarifying scope, adding an extensive Discussion section, improving figures, and transparently addressing performance comparisons. While some reviewers requested additional validation beyond the stated scope, the authors provided reasonable justifications for their methodological choices.


Recommendation
GrAnnoT is a valuable addition to the pangenome analysis toolkit. It solves a real problem efficiently within its domain and will be useful for the growing community working with eukaryotic pangenome graphs. The authors have been responsive to feedback and maintained high standards for reproducibility. I recommend this preprint for PCI Genomics.

                        
References
Garrison E, Guarracino A, Heumos S, Villani F, Bao Z, Tattini L, Hagmann J, Vorbrugg S, Marco-Sola S, Kubica C, Ashbrook DG, Thorell K, Rusholme-Pilcher RL, Liti G, Rudbeck E, Golicz AA, Nahnsen S, Yang Z, Mwaniki MN, Nobrega FL, Wu Y, Chen H, de Ligt J, Sudmant PH, Huang S, Weigel D, Soranzo N, Colonna V, Williams RW, Prins P (2024) Building pangenome graphs. Nature Methods, 21, 2008–2012. https://doi.org/10.1038/s41592-024-02430-3 

Garrison E, Sirén J, Novak AM, Hickey G, Eizenga JM, Dawson ET, Jones W, Garg S, Markello C, Lin MF, Paten B, Durbin R (2018) Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nature Biotechnology, 36, 875–879. https://doi.org/10.1038/nbt.4227 

Gautreau G, Bazin A, Gachet M, Planel R, Burlot L, Dubois M, Perrin A, Médigue C, Calteau A, Cruveiller S, Matias C, Ambroise C, Rocha EPC, Vallenet D (2020) PPanGGOLiN: Depicting microbial diversity via a partitioned pangenome graph. PLoS Computational Biology, 16, e1007732. https://doi.org/10.1371/journal.pcbi.1007732 

Guarracino A, Heumos S, Nahnsen S, Prins P, Garrison E (2022) ODGI: understanding pangenome graphs. Bioinformatics, 38, 3319–3326. https://doi.org/10.1093/bioinformatics/btac308 

Horsfield ST, Tonkin-Hill G, Croucher NJ, Lees JA (2023) Accurate and fast graph-based pangenome annotation and clustering with ggCaller. Genome Research, 33, 1622–1637. https://doi.org/10.1101/gr.277733.123

Marthe N, Zytnicki M, Sabot F (2025) GrAnnoT, a tool for efficient and reliable annotation transfer through pangenome graph. bioRxiv, ver. 3 peer-reviewed and recommended by PCI Genomics https://doi.org/10.1101/2025.02.26.640337

Rautiainen M, Marschall T (2020) GraphAligner: rapid and versatile sequence-to-graph alignment. Genome Biology, 21, 253. https://doi.org/10.1186/s13059-020-02157-2 

Shumate A, Salzberg SL (2021) Liftoff: accurate mapping of gene annotations. Bioinformatics, 37, 1639–1643. https://doi.org/10.1093/bioinformatics/btaa1016 

GrAnnoT, a tool for effecient and reliable annotation transfer through pangenome graphNina Marthe, Matthias Zytnicki, Francois Sabot<p>The increasing availability of genome sequences has highlighted the limitations of using a single reference genome to represent the diversity within a species. Pangenomes, encompassing the genomic information from multiple genomes, thus offer a...BioinformaticsRayan Chikhi2025-03-03 10:36:55 View
21 Sep 2025
article picture

HaploCharmer: a Snakemake workflow for read-scale haplotype calling adapted to polyploids

Haplotype calling in complex polyploids using a single streamlined workflow

Recommended by ORCID_LOGO based on reviews by Dongyan Zhao and 1 anonymous reviewer

When genotyping using high-throughput sequencing data, alleles from adjacent variants can be combined into haplotypes when they occur on the same sequencing read. Compared to bi-allelic single-nucleotide polymorphisms, haplotypes capture richer genomic information in a more compact representation. As a result, haplotypes facilitate deeper insights into linkage disequilibrium and lead to improved outcomes in genome-wide association studies, better genomic prediction and genetic mapping resolution and, consequently, improved population structure inference (Bhat et al. 2021; Gattepaille and Jakobsson 2012). In polyploid species, haplotypes are especially valuable since homologous copies cannot be distinguished by a single bi-allelic variant.

Haplotype calling in polyploids can however be computationally challenging, as the solution space expands sharply with increasing ploidy levels and number of variants. Existing haplotype callers designed for polyploid genomes cannot accommodate more than two alleles at a locus, limiting their ability to reconstruct longer haplotypes (Clevenger et al. 2018). To overcome this limitation, Rio et al. (2025) present HaploCharmer, a Snakemake-based workflow (Köster and Rahmann 2012; Mölder et al. 2021) for read-scale haplotype calling. This pipeline calls haplotypes within pre-defined genomic regions that are smaller than a sequencing read. This is achieved by using a combination of established tools for mapping and variant calling alongside custom scripts for processing and filtering.

The effectiveness of HaploCharmer is demonstrated using a re-sequencing dataset from a progeny of 96 individuals resulting from the self-fertilisation of the sugarcane cultivar R570 (genotyped across over 80,000 regions). Results show that HaploCharmer manages to accurately call haplotypes in sugarcane with very low false positives, and that it also can increase the detection of informative single-dose haplotypes compared to single-variant approaches. Although distinguishing dosage classes is still challenging, this tool provides high-quality genotyping suitable for genetic mapping in complex polyploids. Using these data, single-dose haplotypes were grouped into co-segregation groups to generate a genetic map for this complex polyploid. Moreover, HaploCharmer was tested using a diversity panel of 307 polyploid Saccharum and other related genera, which was genotyped and successfully resolved into major species groups.

Working on recent or complex polyploids always represents bioinformatic challenges due to the high number of duplicated sequences and haplotypes. Taking all this into account, HaploCharmer represents a practical and scalable solution for haplotype genotyping in highly complex polyploid genomes. Earlier approaches that attempted to optimise variant calling for polyploids relied on only 2-allele haplotypes, or on previously generated single-nucleotide polymorphism arrays (Clevenger et al. 2018; Voorrips and Tumino 2022). In contrast, HaploCharmer provides an easier and straightforward approach that allows the construction of longer haplotypes, and only requires short-read re-sequencing data, a reference genome and its pre-defined genomic regions (phase sets).

        

References

Bhat JA, Yu D, Bohra A, Ganie SA, Varshney RK (2021) Features and applications of haplotypes in crop breeding. Communications Biology, 4, 1266. https://doi.org/10.1038/s42003-021-02782-y

Clevenger JP, Korani W, Ozias-Akins P, Jackson S (2018) Haplotype-based genotyping in polyploids. Frontiers in Plant Science, 9. https://doi.org/10.3389/fpls.2018.00564

Gattepaille LM, Jakobsson M (2012) Combining markers into haplotypes can improve population structure inference. Genetics 190, 159–174. https://doi.org/10.1534/genetics.111.131136

Köster J, Rahmann S (2012) Snakemake—a scalable bioinformatics workflow engine. Bioinformatics 28, 2520–2522. https://doi.org/10.1093/bioinformatics/bts480

Mölder F, Jablonski KP, Letcher B, Hall MB, Tomkins-Tinch CH, Sochat V, Forster J, Lee S, Twardziok SO, Kanitz A, Wilm A, Holtgrewe M, Rahmann S, Nahnsen S, Köster J (2021) Sustainable data analysis with Snakemake. F1000Research 10, 33. https://doi.org/10.12688/f1000research.29032.2

Rio S, Abdallah S, Durand T, D’Hont A, Garsmeur O (2025) HaploCharmer: a Snakemake workflow for read-scale haplotype calling adapted to polyploids. bioRxiv, ver. 2 peer-reviewed and recommended by PCI Genomics. https://doi.org/10.1101/2025.03.14.642807

Voorrips RE, Tumino G (2022) PolyHaplotyper: haplotyping in polyploids based on bi-allelic marker dosage data. BMC Bioinformatics, 23, 442. https://doi.org/10.1186/s12859-022-04989-0

HaploCharmer: a Snakemake workflow for read-scale haplotype calling adapted to polyploidsSimon Rio, Sophie Abdallah, Théo Durand, Angélique D'Hont, Olivier Garsmeur<p>The advent of next-generation sequencing (NGS) has revolutionized the study of single nucleotide polymorphisms (SNPs), making it increasingly cost-effective. Haplotypes, which combine alleles from adjacent variants, offer several advantages ove...BioinformaticsLucia Campos Dominguez2025-03-19 15:57:43 View
18 Jul 2025
article picture

localScore: an R package to highlight optimal and suboptimal segment in a sequence with associated p-values computation

localScore: finding optimal segments in genetic sequences

Recommended by ORCID_LOGO based on reviews by Maria Ines Fariello and PCI Genomics

Robelin et al. (2025) propose localScore, an R package for detecting atypical segments of a sequence. Detecting unusual patterns in genetic sequences is a longstanding challenge. Unlike sliding-window methods, which require manual tuning and thus prior experience and familiarity with the data, localScore computes local/suboptimal scores with positions, calculates P-values via multiple statistical methods, and automatically selects an optimal approach for a given sequence, thereby streamlining the analysis and making advanced sequence detection accessible to both experts and non-specialists.

The study by Robelin and colleagues shows its utility in genomics and epidemiology, where atypical segments (e.g., pathogenicity islands or recombination hotspots) vary in length, through practical examples. The package’s flexible modeling, robust suboptimal segment analysis, and open-source accessibility ensure broad applicability across diverse research domains.

                

References

Robelin D, Déjean S, Mercier S (2025) localScore: an R package to highlight optimal and suboptimal segment in a sequence with associated p-values computation. HAL, ver. 4 peer-reviewed and recommended by PCI Genomics. https://hal.science/hal-04723307

localScore: an R package to highlight optimal and suboptimal segment in a sequence with associated p-values computationDavid Robelin, Sébastien Déjean, Sabine Mercier<p>Highlighting atypical segments of a sequence is an important goal in very diverse do-<br>mains. In the case where no prior information on the length of the segment to be high-<br>lighted is known, Karlin and Altschul defined, in 1990, the local...BioinformaticsSishuo Wang2024-10-23 14:21:22 View
20 Jun 2025
article picture
POSTPRINT

Scalable, accessible and reproducible reference genome assembly and evaluation in Galaxy

A new Galaxy workflow to generate and evaluate reference genome assemblies

Recommended by ORCID_LOGO, and ORCID_LOGO

Alba Marino (1), Capucine Mayoud (1), Anna-Sophie Fiston-Lavier (1,2)

(1) ISEM, Univ Montpellier, CNRS, IRD, Montpellier, France

(2) Institut Universitaire de France

Biodiversity is the bedrock of many ecosystem services fundamental to human society. Acquiring genome-level information appears increasingly important for a deeper understanding of biodiversity and to plan conservation actions for endangered species (Lewin et al. 2022). Consortia such as the Vertebrate Genomes Project (VGP; Rhie et al. 2021) and the European Reference Genome Atlas (ERGA; Formenti et al. 2022) have been undertaken to coordinate global efforts toward sequencing of all the existing vertebrate and European eukaryotic species, respectively. Indeed, generating genome-scale data across such a wide taxonomic range presents significant challenges—not least the development and long-term maintenance of computational tools and workflows that ensure both reproducibility and transparency.

Galaxy offers a user-friendly, web-based environment for executing complex pipelines in a reproducible way, as well as servers for data storage (Bray & Maier 2023). In this context, Larivière et al. (2024) present a major enhancement to reference genome assembly with the development of a scalable, accessible, and reproducible pipeline embedded within the Galaxy platform. The framework has been designed to democratize the production of high-quality genomes, in line with initiatives such as the Earth BioGenome Project (Lewin et al. 2022). It integrates six main stages, namely (1) k-mer genome profiling, (2) phased assembly construction, (3) artefactual duplication purging, (4) scaffolding, (5) decontamination, and (6) mitogenome assembly. The pipeline builds on the expertise of VGP (Rhie et al. 2021) and ERGA (Formenti et al. 2022), while incorporating recent advances in high-fidelity long-read sequencing technologies.

A key strength of the pipeline lies in the open availability and its modularity, which enables end-to-end processing from raw reads to curated assemblies while emphasizing reproducibility, transparency, and ease of use (Afgan et al. 2018). Another major advantage is the integration of quality control steps throughout the pipeline. Moreover, the system is designed to accommodate a wide range of input data types and is applicable to a broad spectrum of species (Larivière et al. 2024).

Several public Galaxy instances are available worldwide (e.g. in the USA: https://usegalaxy.org; in Europe: https://usegalaxy.eu; in Australia: https://usegalaxy.org.au). These platforms provide free access to computing resources for running complex workflows and analysing large datasets. Nonetheless, certain steps in genome assembly may require more memory (RAM) or processing power (CPU) than the instances can offer, thus demanding access to high-performance computing (HPC) environments. Although cloud execution is mentioned as a means of processing large amounts of data, the manuscript offers little detail on deployment costs or potential technical barriers. 

Beyond technical and financial considerations, the environmental impact of scaling up genome sequencing and assembly also deserves attention. As more projects are launched and reliance on cloud infrastructure increases, the demand for computing, data storage, and long-term archival will increase substantially. Such operations are energy-intensive and contribute significantly to the environmental footprint of computational biology (Lannelongue & Inouye 2023). While Larivière et al. (2024) rightly emphasize accessibility and scalability, the community must also consider sustainability strategies to limit the ecological impact of large-scale genome initiatives. 

The authors suggest that the pipeline can be adapted for non-vertebrate species, such as plants or fungi, by adjusting a few parameters (e.g. BUSCO clade selection). However, the pipeline has so far only been validated on vertebrate genomes. Its robustness across taxa with complex genomic features, such as extreme GC content, polyploidy, or high repeat density, will require further benchmarking. Finally, another challenge is keeping the pipeline up to date. The rapid evolution of genome assembly tools (Nurk et al. 2022) contrasts with the often slower update cycles of Galaxy workflows, raising concerns about maintaining best practice standards without active long-term governance. The pipeline would benefit from an additional step to compare the established Galaxy pipeline with new assembly tools better suited to data generating using the latest technologies.

In conclusion, Larivière et al. (2024) offer a vital step forward in making reference-quality genome assembly broadly accessible. It is now in the hands of the community to address the remaining open challenges, such as computational accessibility, broader taxonomic validation, environmental sustainability, and further proofing of the pipeline.

                         
References

Afgan E, Baker D, Batut B, van den Beek M, Bouvier D, Čech M, Chilton J, Clements D, Coraor N, Grüning BA, Guerler A, Hillman-Jackson J, Hiltemann S, Jalili V, Rasche H, Soranzo N, Goecks J, Taylor J, Nekrutenko A, Blankenberg D (2018) The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Research, 46, W537–W544. https://doi.org/10.1093/nar/gky379

Bray S, Maier W. (2023) Automating Galaxy workflows using the command line. Galaxy Training Network. https://training.galaxyproject.org/training-material/topics/galaxy-interface/tutorials/workflow-automation/tutorial.html

Formenti G, Theissinger K, Fernandes C, Bista I, Bombarely A, Bleidorn C, et al. (2022) The era of reference genomes in conservation genomics. Trends in Ecology & Evolution, 37, 197–202. https://doi.org/10.1016/j.tree.2021.11.008

Lannelongue, L, Inouye, M (2023) Carbon footprint estimation for computational research. Nat Rev Methods Primers 3, 9. https://doi.org/10.1038/s43586-023-00202-5

Larivière D, Abueg L, Brajuka N, Gallardo-Alba C, Grüning B, Ko BJ, Ostrovsky A, Palmada-Flores M, Pickett BD, Rabbani K, Antunes A, Balacco JR, Chaisson MJP, Cheng H, Collins J, Couture M, Denisova A, Fedrigo O, Gallo GR, Giani AM, Gooder GM, Horan K, Jain N, Johnson C, Kim H, Lee C, Marques-Bonet T, O’Toole B, Rhie A, Secomandi S, Sozzoni M, Tilley T, Uliano-Silva M, van den Beek M, Williams RW, Waterhouse RM, Phillippy AM, Jarvis ED, Schatz MC, Nekrutenko A, Formenti G (2024) Scalable, accessible and reproducible reference genome assembly and evaluation in Galaxy. Nature Biotechnology, 42, 367–370. https://doi.org/10.1038/s41587-023-02100-3

Lewin HA, Richards S, Lieberman Aiden E, Allende ML, et al. (2022) The Earth BioGenome Project 2020: Starting the clock. Proceedings of the National Academy of Sciences, 119, e2115635118. https://doi.org/10.1073/pnas.2115635118

Nurk S, Koren S, Rhie A, Rautiainen M, Bzikadze AV, et al. (2022) The complete sequence of a human genome. Science, 376, 44–53. https://doi.org/10.1126/science.abj6987

Rhie A, McCarthy SA, Fedrigo O, Damas J, Formenti G, et al. (2021) Towards complete and error-free genome assemblies of all vertebrate species. Nature, 592, 737–746. https://doi.org/10.1038/s41586-021-03451-0

Scalable, accessible and reproducible reference genome assembly and evaluation in GalaxyDelphine Larivière, Linelle Abueg, Nadolina Brajuka, Cristóbal Gallardo-Alba, Bjorn Grüning, Byung June Ko, Alex Ostrovsky, Marc Palmada-Flores, Brandon D. Pickett, Keon Rabbani, Agostinho Antunes, Jennifer R. Balacco, Mark J. P. Chaisson, Haoyu C...<p>Improvements in genome sequencing and assembly are enabling high-quality reference genomes for all species. However, the assembly process is still laborious, computationally and technically demanding, lacks standards for reproducibility, and is...Bioinformatics, ERGAAlba Marino2025-05-06 22:47:39 View
16 Jun 2025
article picture

Sexual reproduction is controlled by successive transcriptomic waves in Podospora anserina

Transcriptomic reprogramming during fungal sexual reproduction

Recommended by based on reviews by 2 anonymous reviewers

Sexual reproduction is a hallmark of most eukaryotic life, making the study of its molecular mechanisms a central topic in biology. Filamentous fungi offer a particularly suitable system to explore this question (Peraza-Reyes and Malagnac 2016), not only because they are excellent model organisms, but also because many plant-pathogenic fungi require an annual sexual reproduction cycle to persist within ecosystems. In addition, elucidating and controlling the sexual reproduction of fungi that are relevant to the food industry or the production of industrially valuable compounds is crucial for ensuring their long-term evolutionary stability and for developing novel strains with enhanced or innovative properties.

To dissect the genetic regulation underlying sexual development, Bidard et al. (2025) performed a genome-wide transcriptomic analysis across ten key stages of the sexual cycle in the ascomycete Podospora anserina. This comprehensive temporal transcriptional profiling enabled high-resolution insights into transcriptional dynamics during fungal sexual differentiation.

Two complementary analytical strategies were employed: a data-driven approach and an expert-driven annotation. The data-driven analysis identified 3,466 differentially expressed (DE) genes out of the 10,507 annotated in the P. anserina genome, indicating that approximately one-third of the organism’s gene repertoire is differentially regulated at least one point during the sexual developmental process. This finding highlights the existence of a major transcriptional program orchestrating sexual development, involving a substantial fraction of the genome. Among the 3,466 DE genes, 1,186 exhibited co-regulated expression profiles that could be categorized into five distinct transcriptional waves corresponding to key developmental stages: fertilization, dikaryon formation, karyogamy, meiosis, and ascospore formation and maturation. Finally, a co-expression network analysis focusing on transcription factors (TFs) revealed the critical involvement of several previously uncharacterized TFs, underscoring their potential regulatory roles during sexual development.

This study presents the first comprehensive transcriptomic analysis of sexual development in a pseudo-homothallic ascomycete fungus, which overcomes self-sterility by maintaining two compatible nuclei within a single mycelium. The authors subsequently investigated whether the DE genes identified in this species are evolutionarily conserved in the genomes of filamentous fungi employing distinct mating strategies, including two homothallic (self-fertile) species, Chaetomium globosum and Sordaria macrospora, and two heterothallic (self-sterile) species, Neurospora crassa and Trichoderma reesei. Ortholog searches among the DE genes during sexual reproduction in these fungi revealed that 2,957 DE genes (over 85%) in the pseudo-homothallic species belong to orthologous gene groups shared with at least one of the other species. Moreover, a conserved core set of 1,496 DE genes was identified across all four fungi, suggesting the existence of a shared regulatory framework underpinning sexual development in filamentous ascomycetes.The genes uncovered in this study will thus serve as a foundation for future functional genomics studies on sexual reproduction in fungi and possibly in other eukaryotes as well.

                                 

References

Bidard F, Grognet P, Lelandais G, Imbeaud S, Mucchielli M-H, Debuchy R, Berteaux-Lecellier V, Malagnac F (2025) Sexual reproduction is controlled by successive transcriptomic waves in Podospora anserina. bioRxiv, ver. 2 peer-reviewed and recommended by PCI Genomics. https://doi.org/10.1101/2024.12.09.627484

Peraza-Reyes L, Malagnac F (2016) Sexual development in fungi. Chapter 16 of Growth, Differentiation and Sexuality  (ed Wendland J), pp. 407–455. Springer International Publishing, Cham. https://doi.org/10.1007/978-3-319-25844-7_16

Sexual reproduction is controlled by successive transcriptomic waves in *Podospora anserina*Frédérique Bidard, Pierre Grognet, Gaëlle Lelandais, Sandrine Imbeaud, Marie-Hélène Muchielli, Robert Debuchy, Véronique Berteaux-Lecellier and Fabienne Malagnac<p>Despite the inherent challenge of finding suitable mating partners, most eukaryotes use sexual reproduction to produce offspring endowed with increase genetic diversity and fitness. The persistence of this mode of reproduction is a key question...Bioinformatics, Functional genomics, FungiSébastien Bloyer2024-12-12 12:19:13 View
22 May 2025
article picture
POSTPRINT

The genome sequence of the Violet Carpenter Bee, Xylocopa violacea (Linnaeus, 1785): a hymenopteran species undergoing range expansion

A high-quality genome assembly for carpenter bees

Recommended by ORCID_LOGO and ORCID_LOGO

Christian Lopezguerra (1) and Gavin M. Douglas (2, 3)

(1) Department of Plant and Microbial Biology; (2) Department of Biological Sciences; (3) Bioinformatics Research Center, North Carolina State University, USA

Climate change and anthropogenic stressors are driving rapid biodiversity loss and dynamic shifts in species ranges (Outhwaite et al. 2022). Partially in response to the decline in biodiversity, the European Reference Genome Atlas (ERGA) has been generating high-quality accessible genome resources, allowing for a more collaborative network and assisting conservation efforts.

One recent target was the violet carpenter bee (Xylocopa violacea; Figure 1), one of the many insects that have shown a recent expansion in their range within Europe. This species is a key pollinator and is therefore of great interest for ecological and agricultural purposes. In addition, anticancer research with melittin variants present in the venom of the violet carpenter bee shows potential (Erkoc et al. 2022; von Reumont et al. 2022). However, genetic analyses have been limited by the prior contig-level assembly of the genome (Koludarov et al. 2023). Developing a high-quality, annotated reference genome for the carpenter bee was the goal of Nash and colleagues’ (2024) research, as part of the European Reference Genome Atlas initiative.

Violet carpenter bee (in Margarida, Spain). Copyright Susanne Vogel, a photographer who made this available on iNaturalist.com. Distributed under a CC-BY 4.0 license.

Figure 1: Violet carpenter bee (in Margarida, Spain). Copyright Susanne Vogel, a photographer who made this available on iNaturalist.com. Distributed under a CC-BY 4.0 license.

The authors coupled long and short-read sequencing techniques to create an improved assembly. In particular, they used both short-read RNA-seq and long-read Iso-Seq for gene annotation. They also used Hi-C sequencing to capture chromosome conformational information to aid scaffolding. Their final assembly contains 1,300 scaffolds and has a BUSCO completeness level of 99.75% (Manni et al. 2021), aligning with the standards of the European Reference Genome Atlas. The authors generated a 1.02 gigabase assembly, which was much larger than the expected size of 672 megabases based on k-mer content. The authors partially explain the difference by the high repeat content in the genome, particularly specific 109-mer and 217-mer repeats. Due to this high repeat content, the authors could not assemble full chromosomes but instead produced 17 pseudo-chromosomes comprised of 481.4 megabases (in addition to all other unlocalized scaffolds). 

This high-quality reference genome will be valuable for future studies on population and functional genomics of carpenter bees (Xylocopa). Indeed, this is the first high-quality annotated pseudo-chromosomal genome assembly of the genus Xylocopa, which includes hundreds of other species. It will enable improved investigation into genomic signatures associated with shifting populations. More generally, this reference genome will be useful for comparative analyses with other Hymenoptera species.

             

References

Erkoc P, von Reumont BM, Lüddecke T, Henke M, Ulshöfer T, Vilcinskas A, Fürst R, Schiffmann S (2022) The pharmacological potential of novel melittin variants from the honeybee and solitary bees against inflammation and cancer. Toxins, 14, 818. https://doi.org/10.3390/toxins14120818

Koludarov I, Velasque M, Senoner T, Timm T, Greve C, Hamadou AB, Gupta DK, Lochnit G, Heinzinger M, Vilcinskas A, Gloag R, Harpur BA, Podsiadlowski L, Rost B, Jackson TNW, Dutertre S, Stolle E, von Reumont BM (2023) Prevalent bee venom genes evolved before the aculeate stinger and eusociality. BMC Biology, 21, 229. https://doi.org/10.1186/s12915-023-01656-5

Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM (2021) BUSCO update: Novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Molecular Biology and Evolution, 38, 4647–4654. https://doi.org/10.1093/molbev/msab199

Nash WJ, Man A, McTaggart S, Baker K, Barker T, Catchpole L, Durrant A, Gharbi K, Irish N, Kaithakottil G, Ku D, Providence A, Shaw F, Swarbreck D, Watkins C, McCartney AM, Formenti G, Mouton A, Vella N, von Reumont BM, Vella A, Haerty W (2024) The genome sequence of the Violet Carpenter Bee, Xylocopa violacea (Linnaeus, 1785): a hymenopteran species undergoing range expansion. Heredity, 133, 381–387. https://doi.org/10.1038/s41437-024-00720-2

Outhwaite CL, McCann P, Newbold T (2022) Agriculture and climate change are reshaping insect biodiversity worldwide. Nature, 605, 97–102. https://doi.org/10.1038/s41586-022-04644-x

von Reumont BM, Dutertre S, Koludarov I (2022) Venom profile of the European carpenter bee Xylocopa violacea: Evolutionary and applied considerations on its toxin components. Toxicon: X, 14, 100117. https://doi.org/10.1016/j.toxcx.2022.100117

 

The genome sequence of the Violet Carpenter Bee, *Xylocopa violacea* (Linnaeus, 1785): a hymenopteran species undergoing range expansionWill J. Nash, Angela Man, Seanna McTaggart, Kendall Baker, Tom Barker, Leah Catchpole, Alex Durrant, Karim Gharbi, Naomi Irish, Gemy Kaithakottil, Debby Ku, Aaliyah Providence, Felix Shaw, David Swarbreck, Chris Watkins, Ann M. McCartney, Giulio F...<p style="text-align: justify;">We present a reference genome assembly from an individual male Violet Carpenter Bee (<em>Xylocopa violacea</em>, Linnaeus 1758). The assembly is 1.02 gigabases in span. 48% of the assembly is scaffolded into 17 pseu...Arthropods, ERGA, ERGA PilotChristian Lopezguerra2025-05-16 23:21:44 View
21 May 2025
article picture

Particular sequence characteristics induce bias in the detection of polymorphic transposable element insertions

A new simulation pipeline enhances benchmarking of transposon polymorphism detection tools

Recommended by based on reviews by Tianxiong Yu and 1 anonymous reviewer

Transposable Elements (TEs) are one of the main sources of genome variability. However, their study in populations has been hampered by the difficulty of properly detecting them using whole-genome re-sequencing data. Despite the expectations generated by the rise of long-read sequencing, today it is becoming clear that such technologies will not replace short-reads for analyzing large populations in the short term. Detecting Transposon Insertion Polymorphisms (TIPs) from short-read data is a challenging task, due to the repetitive nature of TE sequences that complicate read mapping. Nevertheless, accurate TIP detection is essential for understanding the evolutionary dynamics of TEs, their regulatory roles and their link with phenotypic variability. In the past 15 years, more than 20 tools have been developed for TIP detection using short-read data, but only a few independent benchmarks have been performed so far (Chen et al. 2023; Nelson et al. 2017; Rishishwar et al. 2017; Vendrell-Mir et al. 2019). Previous benchmarks have used simulated and real data to evaluate tool performance, each with its own set of advantages and limitations. In particular, introducing artificial insertions and simulating genomic short-reads may not reflect the nature of real TEs. By contrast, using real TE insertions as benchmarks can introduce bias since TE annotations are never perfect.

Verneret et al. (2025) introduce an original, alternative approach in which a comprehensive simulation method mimics the most important sequence features of real TEs and non-TE intergenic regions. This simulated data is then combined with true genic sequences, generating a pseudochromosome that can be used for benchmarking TIP detection pipelines. Using this approach, the authors eliminate the bias of TE annotation on real genomes, while preserving most of the characteristics of natural TEs. Using simulated pseudochromosomes for Drosophila melanogaster and Arabidopsis thaliana, Verneret et al. (2025) found that the performance of 14 commonly used TIP-calling tools is highly variable, with only a few performing well, and only at high sequencing depths. In addition to this, the authors analyzed the sequence features of true-positive and false-positive TIP calls, and found that specific TE sequence characteristics (e.g., length, age, etc.) affect the detection of both reference and non-reference TIPs.  

The approach described by Verneret et al. (2025) is an important contribution to the field for several reasons. On the one hand, the results shown in the publication will help the users of such tools make informed decisions before launching their experiments. For more advanced users, it will enable future benchmarks to identify which tools perform best for different species, each with their own sequence characteristics. For software developers, the data released constitutes a precious dataset to test their tools in the same conditions. Finally, the identification of sequence characteristics enriched among false positives and false negatives also gives an opportunity for developers to improve the performance of the new tools by considering these specificities.

                          

References

Chen J, Basting PJ, Han S, Garfinkel DJ, Bergman CM (2023) Reproducible evaluation of transposable element detectors with McClintock 2 guides accurate inference of Ty insertion patterns in yeast. Mobile DNA, 14, 8. https://doi.org/10.1186/s13100-023-00296-4

Nelson MG, Linheiro RS, Bergman CM (2017) McClintock: An integrated pipeline for detecting transposable element insertions in whole-genome shotgun sequencing data. G3: Genes, Genomes, Genetics, 7, 2763–2778. https://doi.org/10.1534/g3.117.043893

Rishishwar L, Mariño-Ramírez L, Jordan IK (2017) Benchmarking computational tools for polymorphic transposable element detection. Briefings in Bioinformatics, 18, 908–918. https://doi.org/10.1093/bib/bbw072

Vendrell-Mir P, Barteri F, Merenciano M, González J, Casacuberta JM, Castanera R (2019) A benchmark of transposon insertion detection tools using real data. Mobile DNA, 10, 53. https://doi.org/10.1186/s13100-019-0197-9

Verneret M, Le VA, Faraut T, Turpin J, Lerat E (2025) Particular sequence characteristics induce bias in the detection of polymorphic transposable element insertions. bioRxiv, ver. 4 peer-reviewed and recommended by PCI Genomics https://doi.org/10.1101/2024.09.25.614865

 

Particular sequence characteristics induce bias in the detection of polymorphic transposable element insertionsMarie Verneret, Van Anthony Le, Thomas Faraut, Jocelyn Turpin, Emmanuelle Lerat<p>Transposable elements (TEs) have an important role in genome evolution but are challenging for bioinformatics detection due to their repetitive nature and ability to move and replicate within genomes. New sequencing technologies now enable the ...Bioinformatics, Evolutionary genomics, Population genomics, Viruses and transposable elementsRaúl Castanera2024-09-30 08:29:19 View
20 May 2025
article picture

Draft genome and transcriptomic sequence data of three invasive insect species

Three more reference genomes of invasive insect species

Recommended by based on reviews by Jean-Marc Aury and Nicolas Parisot

The number and prevalence of invasive species have risen in the last decades together with international trade (Hulme 2021). As invasive species, insects have received less attention than plants. In particular, there are fewer reference genomes currently available, although genomic resources can be useful to investigate invasion dynamics and develop more effective management strategies.

Lombaeart et al. (2025) sequenced and assembled the genomes of three species: Cydalima perspectalis (the box tree moth), Leptoglossus occidentalis (the western conifer seed bug), and Tecia solanivora (the Guatemalan tuber moth), which have in common their rapid spread and severe impact on their respective host plants (boxwoods, conifers and potatoes). The authors generated PacBio HiFi reads, together with Hi-C data to obtain assemblies meeting international quality standards. They also generated short-read RNA-seq data, which they used to provide initial structural annotations of the genes. The resulting reference genomes are still in a draft state because repeats and heterozygosity are notoriously hard to handle. The most challenging genome to assemble was L. occidentalis, with an estimated size of 1.5 Gb, an estimated repeat content of 58%, and an estimated heterozygosity of 1.8%. The raw data produced can still be analysed more in depth to characterise further the repeat content and the heterozygosity of these species.

These reference genomes can readily be used for identifying genetic markers of interest for a variety of applications. In a general context where there is a growing awareness that data production is associated with a significant part of the carbon footprint of research (De Paepe et al. 2024), this dataset has high chances to be extensively reused and analysed by the community.

              

References

De Paepe M, Jeanneau L, Mariette J, Aumont O, Estevez-Torres A (2024) Purchases dominate the carbon footprint of research laboratories. PLOS Sustainability and Transformation, 3, e0000116. https://doi.org/10.1371/journal.pstr.0000116

Hulme, P E (2021) Unwelcome exchange: international trade as a direct and indirect driver of biological invasions worldwide. One Earth 4, 666–679. https://doi.org/10.1016/j.oneear.2021.04.015

Lombaert E, Klopp C, Blin A, Annonay G, Iampietro C, Lluch J, Sallaberry M, Valière S, Poloni R, Joron M, Deleury E (2025) Draft genome and transcriptomic sequence data of three invasive insect species. bioRxiv, ver. 2 peer-reviewed and recommended by PCI Genomics. https://doi.org/10.1101/2024.12.02.626401

Draft genome and transcriptomic sequence data of three invasive insect speciesEric Lombaert, Christophe Klopp, Aurélie Blin, Gwenolah Annonay, Carole Iampietro, Jérôme Lluch, Marine Sallaberry, Sophie Valière, Riccardo Poloni, Mathieu Joron, Emeline Deleury<p><em>Cydalima perspectalis</em> (the box tree moth), <em>Leptoglossus occidentalis</em> (the western conifer seed bug), and <em>Tecia solanivora</em> (the Guatemalan tuber moth) are three economically harmful invasive insect species. This study ...ArthropodsVincent Lacroix Jean-Marc Aury, Nicolas Parisot2024-12-06 11:07:34 View