0% found this document useful (0 votes)
31 views4 pages

Protein Sequence Databases

The document outlines the distinction between primary and secondary protein sequence databases. Primary databases, such as UniProt and NCBI Protein, store experimentally determined sequences, while secondary databases, like Pfam and InterPro, analyze and classify these sequences for additional insights. This structured approach aids researchers in accessing raw data and enriched information for various applications in genomics and proteomics.

Uploaded by

asdfj7505
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views4 pages

Protein Sequence Databases

The document outlines the distinction between primary and secondary protein sequence databases. Primary databases, such as UniProt and NCBI Protein, store experimentally determined sequences, while secondary databases, like Pfam and InterPro, analyze and classify these sequences for additional insights. This structured approach aids researchers in accessing raw data and enriched information for various applications in genomics and proteomics.

Uploaded by

asdfj7505
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

**Protein Sequence Databases: Primary and Secondary**

**1. Primary Databases**


Primary databases store experimentally determined protein
sequences and associated metadata. They serve as
repositories for raw data submitted by researchers, often
including annotations like source organism, function, and
references. Key examples include:

- **UniProt** (Universal Protein Resource):


- **Swiss-Prot**: Manually curated entries with detailed
annotations, including function, structure, and post-
translational modifications.
- **TrEMBL**: Automatically annotated entries awaiting
curation, derived from EMBL-Bank/GenBank/DDBJ
translations.
- **UniProtKB**: Combines Swiss-Prot and TrEMBL,
offering comprehensive coverage.

- **NCBI Protein**: Part of the Entrez system, aggregating


data from GenBank, RefSeq, and PDB. RefSeq provides non-
redundant, curated sequences.
- **DDBJ** (DNA Data Bank of Japan): Collaborates with
GenBank and ENA to archive nucleotide sequences, with
protein translations available.

- **PIR** (Protein Information Resource): Now part of


UniProt, historically focused on protein classification.

**2. Secondary Databases**


Secondary databases analyze, classify, or predict features
from primary data, adding value through computational or
manual curation. They focus on domains, families,
structures, or functional annotations. Examples include:

- **Pfam**: Protein family database using hidden Markov


models (HMMs) to identify domains and families.
- **PROSITE**: Catalogs protein domains, families, and
functional sites using patterns and profiles.
- **InterPro**: Integrates multiple databases (Pfam,
PROSITE, PRINTS, etc.) to provide comprehensive protein
signature analysis.
- **PRINTS**: Fingerprint database for protein motif
identification.
- **SMART**: Focuses on domain architectures, particularly
in signaling and extracellular proteins.
- **CDD** (Conserved Domain Database): Annotates
conserved domains using tools like RPS-BLAST.

**Structural and Functional Secondary Databases**:


- **SCOP** (Structural Classification of Proteins) &
**CATH**: Classify protein structures into hierarchies (e.g.,
folds, superfamilies).
- **KEGG**: Maps proteins to metabolic pathways and
functional networks.
- **STRING**: Predicts protein-protein interactions based on
genomic context and experimental data.

**Key Differences**:
- **Primary**: Store raw sequences (e.g., UniProt).
- **Secondary**: Provide derived information (e.g., Pfam for
families, SCOP for structural classification).
**Applications**:
- **Primary**: Direct access to sequence data for research
like cloning or phylogenetics.
- **Secondary**: Facilitate functional annotation,
evolutionary studies, and structural predictions.

**Integration**: Tools like BLAST use primary databases for


sequence alignment, while secondary databases enhance
interpretation (e.g., identifying domains in BLAST results via
InterPro).

This structured approach ensures researchers can access


both raw data and enriched insights, driving advancements
in genomics and proteomics.

You might also like