Updated protein family models used by PGAP available for download

Release 3.0 of the NCBI protein family models used by the Prokaryotic Genome Annotation Pipeline (PGAP) is now available from our FTP site. You can search this collection of hidden Markov models (HMMs) against your favorite prokaryotic proteins to identify their function using the HMMER sequence analysis package.

The 3.0 release contains 17,350 models: 12,864 HMMs built at NCBI (111 more than in release 2.0) and 4,486 TIGRFAM HMMs. In addition, since release 2.0,  we have assigned product names to over 2,000 Pfam HMMs, bringing the total to 6,698 Pfam HMMs with names that can be transferred by PGAP to the annotated proteins they hit. You can access a table of these product names from the release directory.Prot_evidenceFigure 1. The evidence for name assignment for type III secretion system (T3SS) translocon subunit SctB (NF038055) showing the protein matches. Species-specific names for this highly variable component of T3SS include YopD, EspB, IpaC, SipC, etc. Instead, we used the standard moniker for core genes of T3SS, Sct, Secretion and cellular translocation (PMID 26520801,  PMID 9618447) providing a unified nomenclature for this secretion system. 

For this release we focused on adding and modifying HMMs from two public databases: the Transporter Classification Database  and the Virulence Factors of Pathogenic Bacteria database.   Figure 1 shows an example of naming evidence for NCBI RefSeq proteins from a new model for type III secretion system (T3SS) translocon subunit SctB, NF038055.

PGAP uses the HMMs as hints for the annotation of protein-coding genes and is the source for many of the names assigned to PGAP-annotated proteins. We have now named more than 75 million RefSeq prokaryotic proteins based on these curated HMMs (protein query: “meta Evidence-For-Name-Assignment”[Properties] AND “Evidence category=HMM”[Text Word] ). For HMM-named RefSeq proteins, the Evidence-For-Name-Assignment comment block of the records provides the details and a link to an individual web page for the HMM.  See an example and more information on web displays of HMMs in a previous post .

 

 

Leave a Reply