Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2011, Journal of Applied Crystallography
…
4 pages
1 file
Repeats are two or more contiguous segments of amino acid residues that are believed to have arisen as a result of intragenic duplication, recombination and mutation events. These repeats can be utilized for protein structure prediction and can provide insights into the protein evolution and phylogenetic relationship. Therefore, to aid structural biologists and phylogeneticists in their research, a computing resource (a web server and a database), Repeats in Protein Sequences (RPS), has been created. Using RPS, users can obtain useful information regarding identical, similar and distant repeats (of varying lengths) in protein sequences. In addition, users can check the frequency of occurrence of the repeats in sequence databases such as the Genome Database, PIR and SWISS-PROT and among the protein sequences available in the Protein Data Bank archive. Furthermore, users can view the three-dimensional structure of the repeats using the Java visualization plug-inJmol. The proposed comp...
Journal of Structural Biology, 2001
Internal repetition within proteins has been a successful strategem on multiple separate occasions throughout evolution. Such protein repeats possess regular secondary structures and form multirepeat assemblies in three dimensions of diverse sizes and functions. In general, however, internal repetition affords a protein enhanced evolutionary prospects due to an enlargement of its available binding surface area. Constraints on sequence conservation appear to be relatively lax, due to binding functions ensuing from multiple, rather than, single repeats. Considerable sequence divergence as well as the short lengths of sequence repeats mean that repeat detection can be a particularly arduous task. We also consider the conundrum of how multiple repeats, which show strong structural and functional interdependencies, ever evolved from a single repeat ancestor. In this review, we illustrate each of these points by referring to six prolific repeat types (repeats in -propellers and -trefoils and tetratricopeptide, ankyrin, armadillo/HEAT, and leucine-rich repeats) and in other less-prolific but nonetheless interesting repeats.
Trends in Biochemical Sciences, 2000
Protein Science, 2000
All the protein sequences from SWISS-PROT database were analyzed for occurrence of single amino acid repeats, tandem oligo-peptide repeats, and periodically conserved amino acids. Single amino acid repeats of glutamine, serine, glutamic acid, glycine, and alanine seem to be tolerated to a considerable extent in many proteins. Tandem oligo-peptide repeats of different types with varying levels of conservation were detected in several proteins and found to be conspicuous, particularly in structural and cell surface proteins. It appears that repeated sequence patterns may be a mechanism that provides regular arrays of spatial and functional groups, useful for structural packing or for one to one interactions with target molecules. To facilitate further explorations, a database of Tandem Repeats in Protein Sequences (TRIPS) has been developed and is available at URL: http://www.ncl-india.org/trips.
Nucleic Acids Research, 2012
ProRepeat (http://prorepeat.bioinformatics.nl/) is an integrated curated repository and analysis platform for in-depth research on the biological characteristics of amino acid tandem repeats. ProRepeat collects repeats from all proteins included in the UniProt knowledgebase, together with 85 completely sequenced eukaryotic proteomes contained within the RefSeq collection. It contains non-redundant perfect tandem repeats, approximate tandem repeats and simple, low-complexity sequences, covering the majority of the amino acid tandem repeat patterns found in proteins. The ProRepeat web interface allows querying the repeat database using repeat characteristics like repeat unit and length, number of repetitions of the repeat unit and position of the repeat in the protein. Users can also search for repeats by the characteristics of repeat containing proteins, such as entry ID, protein description, sequence length, gene name and taxon. ProRepeat offers powerful analysis tools for finding biological interesting properties of repeats, such as the strong position bias of leucine repeats in the N-terminus of eukaryotic protein sequences, the differences of repeat abundance among proteomes, the functional classification of repeat containing proteins and GC content constrains of repeats' corresponding codons.
Frontiers in Bioengineering and Biotechnology, 2019
Amino acid repeats play an important role in the structure and function of proteins. Analysis of long repeats in protein sequences enables one to understand their abundance, structure and function in the protein universe. In the present study, amino acid repeats of length >50 (long repeats) were identified in a non-redundant set of UniProt sequences using the RADAR program. The underlying structures and functions of these long repeats were carried out using the Gene3D for structural domains, Pfam for functional domains and enzyme and non-enzyme functional classification for catalytic and binding of the proteins. From a structural perspective, these long repeats seem to predominantly occur in certain architectures such as sandwich, bundle, barrel, and roll and within these architectures abundant in the superfolds. The lengths of the repeats within each fold are not uniform exhibiting different structures for different functions. We also observed that long repeats are in the domain regions of the family and are involved in the function of the proteins. After grouping based on enzyme and non-enzyme classes, we observed the abundant occurrence of long repeats in specific catalytic and binding of the proteins. In this study, we have analyzed the occurrence of long repeats in the protein sequence universe apart from well-characterized short tandem repeats in sequences and their structures and functions of the proteins at the domain level. The present study suggests that long repeats may play an important role in the structure and function of domains of the proteins.
Genome Research, 2007
Scientific Reports, 2016
Some natural proteins display recurrent structural patterns. Despite being highly similar at the tertiary structure level, repetitions within a single repeat protein can be extremely variable at the sequence level. We propose a mathematical definition of a repeat and investigate the occurrences of these in different protein families. We found that long stretches of perfect repetitions are infrequent in individual natural proteins, even for those which are known to fold into structures of recurrent structural motifs. We found that natural repeat proteins are indeed repetitive in their families, exhibiting abundant stretches of 6 amino acids or longer that are perfect repetitions in the reference family. We provide a systematic quantification for this repetitiveness. We show that this form of repetitiveness is not exclusive of repeat proteins, but also occurs in globular domains. A by-product of this work is a fast classifier of proteins into families, which yields likelihood value about a given protein belonging to a given family.
BMC Bioinformatics, 2006
Background Genome wide and cross species comparisons of amino acid repeats is an intriguing problem in biology mainly due to the highly polymorphic nature and diverse functions of amino acid repeats. Innate protein repeats constitute vital functional and structural regions in proteins. Repeats are of great consequence in evolution of proteins, as evident from analysis of repeats in different organisms. In the post genomic era, availability of protein sequences encoded in different genomes provides a unique opportunity to perform large scale comparative studies of amino acid repeats. ProtRepeatsDB http://bioinfo.icgeb.res.in/repeats/ is a relational database of perfect and mismatch repeats, access to which is designed as a resource and collection of tools for detection and cross species comparisons of different types of amino acid repeats. Description ProtRepeatsDB (v1.2) consists of perfect as well as mismatch amino acid repeats in the protein sequences of 141 organisms, the genomes of which are now available. The web interface of ProtRepeatsDB consists of different tools to perform repeat s; based on protein IDs, organism name, repeat sequences, and keywords as in FASTA headers, size, frequency, gene ontology (GO) annotation IDs and regular expressions (REGEXP) describing repeats. These tools also allow formulation of a variety of simple, complex and logical queries to facilitate mining and large-scale cross-species comparisons of amino acid repeats. In addition to this, the database also contains sequence analysis tools to determine repeats in user input sequences. Conclusion ProtRepeatsDB is a multi-organism database of different types of amino acid repeats present in proteins. It integrates useful tools to perform genome wide queries for rapid screening and identification of amino acid repeats and facilitates comparative and evolutionary studies of the repeats. The database is useful for identification of species or organism specific repeat markers, interspecies variations and polymorphism.
Nucleic Acids Research, 2014
RepeatsDB (http://repeatsdb.bio.unipd.it/) is a database of annotated tandem repeat protein structures. Tandem repeats pose a difficult problem for the analysis of protein structures, as the underlying sequence can be highly degenerate. Several repeat types haven been studied over the years, but their annotation was done in a case-by-case basis, thus making large-scale analysis difficult. We developed RepeatsDB to fill this gap. Using state-of-the-art repeat detection methods and manual curation, we systematically annotated the Protein Data Bank, predicting 10 745 repeat structures. In all, 2797 structures were classified according to a recently proposed classification schema, which was expanded to accommodate new findings. In addition, detailed annotations were performed in a subset of 321 proteins. These annotations feature information on start and end positions for the repeat regions and units. RepeatsDB is an ongoing effort to systematically classify and annotate structural protein repeats in a consistent way. It provides users with the possibility to access and download high-quality datasets either interactively or programmatically through web services.
Proteins: Structure, Function, and Genetics, 1993
An automated algorithm is presented that delineates protein sequence fragments which display similarity. The method incorporates a selection of a number of local nonoverlapping sequence alignments with the highest similarity scores and a graphtheoretical approach to elucidate the consistent start and end points of the fragments comprising one or more ensembles of related subsequences. The procedure allows the simultaneous identification of different types of repeats within one sequence. A multiple alignment of the resulting fragments is performed and a consensus sequence derived from the ensemble(s). Finally, a profile is constructed from the multiple alignment to detect possible and more distant members within the sequence. The method tolerates mutations in the repeats as well as insertions and deletions. The sequence spans between the various repeats or repeat clusters may be of different lengths. The technique has been applied to a number of proteins where the repeating fragments have been derived from information additional to the protein sequences. o 1993 Wiley-Us, hc.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
2010 International Conference on Complex, Intelligent and Software Intensive Systems, 2010
Genome Biology, 2009
Nucleic Acids Research
Physical review. E, Statistical, nonlinear, and soft matter physics, 2012
Bioinformatics (Oxford, England), 2016
Journal of Molecular Biology, 2001
Bioinformatics
Structure, 1995
Trends in Genetics, 2010