We are happy to announce the release of a new version of the stand-alone Prokaryotic Genome Annotation Pipeline (PGAP).
This version of PGAP offers a more streamlined experience to users who are uncertain about the taxonomic classification of the genomes they wish to annotate. Adding one flag to the command (--auto-correct-tax) results in the override of the species name provided on input if the taxonomy verification process predicts a different organism with high confidence.
In addition, with this new release, you can start annotating genes on your favorite genomes with Gene Ontology (GO) terms. The terms are derived from the Protein Family Models (hidden Markov models, BlastRules and domain architectures) that name the proteins. On average, a third of Coding Sequences (CDSs) annotated by this new version of PGAP will get at least one GO term. We are actively working on mapping more GO terms to our Protein Family Models so this percentage will grow with future PGAP releases. See more information in this blog post.
Additional features and bug fixes include:
- The incorporation of 17 RNA Family (RFAM) models for the annotation of more riboswitches.
- The introduction of a minimum coverage threshold of 20% in the taxonomy verification module. If the genome assembly doesn’t match any type material assembly over 20% of its length, no organism name will be predicted.
- Assemblies for organisms without a genus in their lineage can now be annotated (bug fix).
- Running PGAP with Singularity without internet access (-
no-internet) is now possible. Users need to point pgap.py to a local SIF image (converted from Docker) using the -container-path argument (bug fix).
Please try this new version and share your experience with us!