0% found this document useful (0 votes)
263 views7 pages

Composite Dbs

The document discusses the proliferation of primary sequence databases (dbs) and the challenges in choosing the most accurate, up-to-date, and comprehensive options. It highlights various dbs such as NRL-3D, PIR, SWISS-PROT, and composite dbs like NRDB, OWL, and MIPSX, each with unique features and limitations. The document suggests that composite dbs can streamline searches by amalgamating multiple sources, thus improving efficiency and reducing redundancy.

Uploaded by

sadia.202204062
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
263 views7 pages

Composite Dbs

The document discusses the proliferation of primary sequence databases (dbs) and the challenges in choosing the most accurate, up-to-date, and comprehensive options. It highlights various dbs such as NRL-3D, PIR, SWISS-PROT, and composite dbs like NRDB, OWL, and MIPSX, each with unique features and limitations. The document suggests that composite dbs can streamline searches by amalgamating multiple sources, thus improving efficiency and reducing redundancy.

Uploaded by

sadia.202204062
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

An embarras de richesses

Th proliferation of primary sequence dbs gives rise to a


number of questions:

Do they all have same format?


Which is the most accurate?
Which is the most up-to-date?
Which is the most comrehensive?
Given the choice, which should we use?

Of the protein sequence dbs, NRL-3D is the least


comprehensive because it reflects only the contents of PDB,
yet it has the advantage of relating directly to structural
information.
PIR (1-4) is the most coprehensive resource, but the quality of
its annotations is still relatively poor.

SWISS-PROT, on the other hand, is a highly structured db that


provides excellent annotations, but its sequence coverage is
poor compared to PIR.

Choosing the right db to search can seem an impossible


choice; so is it, perhaps, better to search them all?
Composite Protein Sequence Dbs
One solution to the problem of proliferation primary dbs is to
compile a composite, i.e. a db that amalgamates a variety of
different primary sources.

Composite dbs: These dbs render sequence searching much


more efficient, because they obviate the need to interrogate
multiple resources.
The interrogation process is streamlined still further if the
composite has been designed to be non-redundant, as this
means that the same sequence need not be searched more
than once.
Different strategies can be used to create composite
resources.
The final product depends on the chosen data sources and the
criteria used to merge them; e.g.

A composite resource will be non-identical if it eliminates only


identical sequence copies during the amalgamation process.

But if both identical and highly similar sequences are ejected


(e.g. those entries that differ by only one residue), then the
resulting db will be more truly non-redundant.
The choice of different sources and the application of different
redundancy criteria have led to the emergence of different
composites, each of which has its own particular format.
The main dbs are outlined below.

NRDB: Non-Redundant Db is built at the NCBI.


The db is a composite of GenPept (derived from automatic
GenBank CDS translations), PDB sequences, SWISS-PROT,
SPupdate (the weekly updates of SWISS-PROT), PIR and
GenPeptupdate (the daily updates of GenPept).

This db is thus comprehensive and contains up-to-date


information.
However, strictly speaking, it is not non-redundant but non-
identical i.e. only identical sequence copies are removed from
the resource.
OWL: It is non-Redundant protein sequene db built at the
University of Leeds in collaboration with the Daresbury
Laboratory in Warrington.

The db is a composite of four major primary sources: SWISS-


PROT, PIR 1-4, GenBank (CDS tranlations) and NRL-3D.

MIPSX: It is merged db produced at the Max-Planck Institut in


Martinsried.

The db contains information from the following resources: PIR


1-4, MIPS preliminary entries, MIPSOwn; MIPS/PIR
preliminary entries, PIRMOD; MIPS preliminary translations,
MIPSTrn; MIPS yeast entries, MIPSH, NRL-3D, SWISS-PROT,
EMTrans, GBTrans, Kabat and PSeqIP.
SWISS-PROT + TrEMBL: At the EBI, the combination of SWISS-
PROT and TrEMBL provides a resource that is both
comprehensive and minimally redundant.

This db has the advantage of containing fewer errors than do


those mentioned above.

You might also like