Genome informatics. International Conference on Genome Informatics, 2005
Glycan resources have been developed of late, such as carbohydrate databases, analysis tools, and... more Glycan resources have been developed of late, such as carbohydrate databases, analysis tools, and algorithms for analysis of carbohydrate features. With this background, bioinformatics approaches to carbohydrate research have recently begun using a large amount of protein and carbohydrate data. This paper introduces one of these projects that elucidates the range of carbohydrate structures. In this study, the variety of carbohydrate structures have been enumerated in a global tree structure called variation trees, using the KEGG GLYCAN database, which is a public-domain glycan resource for bioinformatics analysis. Additionally, a glycosyltransferase mapping list of glycosyltransferases and their catalyzing glycosidic linkages was constructed. From this, we present the composite structure map (CSM), which is a structural variation map integrating its variation trees and glycosyltransferase map list. CSM is able to display, for example, expression data of glycosyltransferases in a com...
Mass spectra provide the ultimate evidence for supporting the findings of mass spectrometry (MS) ... more Mass spectra provide the ultimate evidence for supporting the findings of mass spectrometry (MS) proteomics studies in publications, and it is therefore crucial to be able to trace the conclusions back to the spectra. The Universal Spectrum Identifier (USI) provides a standardized mechanism for encoding a virtual path to any mass spectrum contained in datasets deposited to public proteomics repositories. USIs enable greater transparency for providing spectral evidence in support of key findings in publications, with more than 1 billion USI identifications from over 3 billion spectra already available through ProteomeXchange repositories.
DBTSS (Database of Transcriptional Start Sites)/DBKERO (Database of Kashiwa Encyclopedia for huma... more DBTSS (Database of Transcriptional Start Sites)/DBKERO (Database of Kashiwa Encyclopedia for human genome mutations in Regulatory regions and their Omics contexts) is the database originally initiated with the information of transcriptional start sites and their upstream transcriptional regulatory regions. In recent years, we updated the database to assist users to elucidate biological relevance of the human genome variations or somatic mutations in cancers which may affect the transcriptional regulation. In this update, we facilitate interpretations of disease associated genomic variation, using the Japanese population as a model case. We enriched the genomic variation dataset consisting of the 13,368 individuals collected for various genome-wide association studies and the reference epigenome information in the surrounding regions using a total of 455 epigenome datasets (four tissue types from 67 healthy individuals) collected for the International Human Epigenome Consortium (IHEC...
Major advancements have recently been made in mass spectrometry-based proteomics, yielding an inc... more Major advancements have recently been made in mass spectrometry-based proteomics, yielding an increasing number of datasets from various proteomics projects worldwide. In order to facilitate the sharing and reuse of promising datasets, it is important to construct appropriate, high-quality public data repositories. jPOSTrepo (https://repository.jpostdb.org/) has successfully implemented several unique features, including high-speed file uploading, flexible file management and easy-to-use interfaces. This repository has been launched as a public repository containing various proteomic datasets and is available for researchers worldwide. In addition, our repository has joined the ProteomeXchange consortium, which includes the most popular public repositories such as PRIDE in Europe for MS/MS datasets and PASSEL for SRM datasets in the USA. Later MassIVE was introduced in the USA and accepted into the ProteomeXchange, as was our repository in July 2016, providing important datasets fro...
Background: Glycoscience is a research field focusing on complex carbohydrates (otherwise known a... more Background: Glycoscience is a research field focusing on complex carbohydrates (otherwise known as glycans) a , which can, for example, serve as "switches" that toggle between different functions of a glycoprotein or glycolipid. Due to the advancement of glycomics technologies that are used to characterize glycan structures, many glycomics databases are now publicly available and provide useful information for glycoscience research. However, these databases have almost no link to other life science databases. Results: In order to implement support for the Semantic Web most efficiently for glycomics research, the developers of major glycomics databases agreed on a minimal standard for representing glycan structure and annotation information using RDF (Resource Description Framework). Moreover, all of the participants implemented this standard prototype and generated preliminary RDF versions of their data. To test the utility of the converted data, all of the data sets were uploaded into a Virtuoso triple store, and several SPARQL queries were tested as "proofs-of-concept" to illustrate the utility of the Semantic Web in querying across databases which were originally difficult to implement. Conclusions: We were able to successfully retrieve information by linking UniCarbKB, GlycomeDB and JCGGDB in a single SPARQL query to obtain our target information. We also tested queries linking UniProt with GlycoEpitope as well as lectin data with GlycomeDB through PDB. As a result, we have been able to link proteomics data with glycomics data through the implementation of Semantic Web technologies, allowing for more flexible queries across these domains.
The interaction between biological researchers and the bioinformatics tools they use is still ham... more The interaction between biological researchers and the bioinformatics tools they use is still hampered by incomplete interoperability between such tools. To ensure interoperability initiatives are effectively deployed, end-user applications need to be aware of, and support, best practices and standards. Here, we report on an initiative in which software developers and genome biologists came together to explore and raise awareness of these issues: BioHackathon 2009. Developers in attendance came from diverse backgrounds, with experts in Web services, workflow tools, text mining and visualization. Genome biologists provided expertise and exemplar data from the domains of sequence and pathway analysis and glyco-informatics. One goal of the meeting was to evaluate the ability to address real world use cases in these domains using the tools that the developers represented. This resulted in i) a workflow to annotate 100,000 sequences from an invertebrate species; ii) an integrated system ...
Proteins: Structure, Function, and Bioinformatics, 2006
Previous studies have demonstrated that endoglucanase is required for cellulose biosynthesis both... more Previous studies have demonstrated that endoglucanase is required for cellulose biosynthesis both in bacteria and plants. However, it has yet to be elucidated how the endoglucanases function in the mechanism of cellulose biosynthesis. Here we describe the crystal structure of the cellulose biosynthesis-related endo--1,4-glucanase (CM-Cax; EC 3.2.1.4) from the cellulose-producing Gramnegative bacterium, Acetobacter xylinum ؍( Gluconacetobacter xylinus), determined at 1.65-Å resolution. CMCax falls into the glycoside hydrolase family 8 (GH-8), and the structure showed that the overall fold of the CMCax is similar to those of other glycoside hydrolases belonging to GH-8. Structure comparison with Clostridium thermocellum CelA, the best characterized GH-8 endoglucanase, revealed that sugar recognition subsite ؉3 is completely missing in CMCax. The absence of the subsite ؉3 leads to significant broadness of the cleft at the cellooligosaccharide reducing-end side. CMCax is known to be a secreted enzyme and is present in the culture medium. However, electron microscopic analysis using immunostaining clearly demonstrated that a portion of CMCax is localized to the cell surface, suggesting a link with other known membrane-anchored endoglucanases that are required for cellulose biosynthesis.
TogoTable (http://togotable.dbcls.jp/) is a web tool that adds user-specified annotations to a ta... more TogoTable (http://togotable.dbcls.jp/) is a web tool that adds user-specified annotations to a table that a user uploads. Annotations are drawn from several biological databases that use the Resource Description Framework (RDF) data model. TogoTable uses database identifiers (IDs) in the table as a query key for searching. RDF data, which form a network called Linked Open Data (LOD), can be searched from SPARQL endpoints using a SPARQL query language. Because TogoTable uses RDF, it can integrate annotations from not only the reference database to which the IDs originally belong, but also externally linked databases via the LOD network. For example, annotations in the Protein Data Bank can be retrieved using GeneID through links provided by the UniProt RDF. Because RDF has been standardized by the World Wide Web Consortium, any database with annotations based on the RDF data model can be easily incorporated into this tool. We believe that TogoTable is a valuable Web tool, particularly for experimental biologists who need to process huge amounts of data such as high-throughput experimental output.
Although cellulose is the most abundant biopolymer in nature, the detailed mechanisms of cellulos... more Although cellulose is the most abundant biopolymer in nature, the detailed mechanisms of cellulose biosynthesis remain unknown. Acetobacter xylinum is one of the best-studied model organisms for cellulose biosynthesis. Interestingly, the over-expression of the cmcax gene cause enhancement of cellulose production in A. xylinum, while its product (CMCax) has cellulose degradation activity. The addition of CMCax into medium also promotes cellulose production, suggesting that CMCax is involved in cellulose synthetic pathway. In the present study, we reveal the regulation mechanism of cmcax expression in A. xylinum. First, we treated cells with four kinds of beta-glucodisaccharide. Using an enzyme assay and real-time quantitative reverse transcriptase polymerase chain reaction (qRT-PCR), we observed an increase in CMCax activity and an induction of cmcax expression by gentiobiose treatment. Therefore, we concluded that gentiobiose induced cmcax expression. Although gentiobiose does not originally exist in the cultivation medium, we have revealed that membrane and intra-cellular proteins extracted from A. xylinum produce gentiobiose from glucose, which is one of the components in the cultivation medium. Furthermore, we confirmed that cmcax expression in a wild-type strain increased gradually after 5 d cultivation using real-time qRT-PCR. These results have led us to conclude that the increase in cmcax expression after 5 d cultivation is caused by the increase in gentiobiose, which could be synthesized by a condensation reaction in A. xylinum. Since CMCax plays a pivotal role in the cellulose production system, our results will contribute to the elucidation of mechanisms of cellulose biosynthesis.
Bioinformatics approaches to carbohydrate research have recently begun using large amounts of pro... more Bioinformatics approaches to carbohydrate research have recently begun using large amounts of protein and carbohydrate data. In this field called glycome informatics, the foremost necessity is a comprehensive resource for genome-scale bioinformatics analysis of glycan data. Although the accumulation of experimental data may be useful as a reference of biological and biochemical information on carbohydrates, this is insufficient for bioinformatics analysis. Thus, we have developed a glycome informatics resource (http://www.genome.jp/ kegg/glycan/) in KEGG (Kyoto Encyclopedia of Genes and Genomes), an integrated knowledge base of protein networks, genomic information, and chemical information. This review describes three noteworthy features: (1) GLYCAN, a database of carbohydrate structures; (2) glycan-related pathways; and (3) Composite Structure Map (CSM), a map illustrating all possible variations of carbohydrate structures within organisms. GLYCAN includes two useful tools: an intuitive drawing tool called KegDraw, and an efficient glycan search and alignment tool called KEGG Carbohydrate Matcher (KCaM). KEGG's glycan biosynthesis and metabolism pathways, integrating carbohydrate structures, proteins, and reactions, are also a pivotal resource. CSM is constructed as a bridge between carbohydrate functions and structures. CSM is able to display, for example, expression data of glycosyltransferases in a compact manner. In all the KEGG resources, various objects including KEGG pathways, chemical compounds, as well as carbohydrate structures are commonly represented as graphs, which are widely studied and utilized in the computer science field.
About 14.5 kb of DNA fragments from Acetobacter xylinum ATCC23769 and ATCC53582 were cloned, and ... more About 14.5 kb of DNA fragments from Acetobacter xylinum ATCC23769 and ATCC53582 were cloned, and their nucleotide sequences were determined. The sequenced DNA regions contained endo-β-1,4glucanase, cellulose complementing protein, cellulose synthase subunit AB, C, D and β-glucosidase genes. The results from a homology search of deduced amino acid sequences between A. xylinum ATCC23769 and ATCC53582 showed that they were highly similar. However, the amount of cellulose production by ATCC53582 was 5 times larger than that of ATCC23769 during a 7-day incubation. In A. xylinum ATCC53582, synthesis of cellulose continued after glucose was consumed, suggesting that a metabolite of glucose, or a component of the medium other than glucose, may be a substrate of cellulose. On the other hand, cell growth of ATCC23769 was twice that of ATCC53582. Glucose is the energy source in A. xylinum as well as the substrate of cellulose synthesis, and the metabolic pathway of glucose in both strains may be different. These results suggest that the synthesis of cellulose and the growth of bacterial cells are contradictory.
Glycosyltransferases comprise highly divergent groups of enzymes, which play a central role in th... more Glycosyltransferases comprise highly divergent groups of enzymes, which play a central role in the synthesis of complex glycans. Because the repertoire of glycosyltransferases in the genome determines the range of synthesizable glycans, and because the increasing amount of genome sequence data is now available, it is essential to examine these enzymes across organisms to explore possible structures and functions of the glycoconjugates. In this study, we systematically investigated 36 eukaryotic genomes and obtained 3426 glycosyltransferase homologs for biosynthesis of major glycans, classified into 53 families based on sequence similarity. The families were further grouped into six functional categories based on the biosynthetic pathways, which revealed characteristic patterns among organism groups in the degree of conservation and in the number of paralogs. The results also revealed a strong correlation between the number of glycosyltransferases and the number of coding genes in each genome. We then predicted the ability to synthesize major glycan structures including N-glycan precursors and GPIanchors in each organism from the combination of the glycosyltransferase families. This indicates that not only parasitic protists but also some algae are likely to synthesize smaller structures than the structures known to be conserved among a wide range of eukaryotes. Finally we discuss the functions of two large families, sialyltransferases and b4-glycosyltransferases, by performing finer classifications into subfamilies. Our findings suggest that universality and diversity of glycans originate from two types of evolution of glycosyltransferase families, namely conserved families with few paralogs and diverged families with many paralogs.
Genome informatics. International Conference on Genome Informatics, 2005
Glycan resources have been developed of late, such as carbohydrate databases, analysis tools, and... more Glycan resources have been developed of late, such as carbohydrate databases, analysis tools, and algorithms for analysis of carbohydrate features. With this background, bioinformatics approaches to carbohydrate research have recently begun using a large amount of protein and carbohydrate data. This paper introduces one of these projects that elucidates the range of carbohydrate structures. In this study, the variety of carbohydrate structures have been enumerated in a global tree structure called variation trees, using the KEGG GLYCAN database, which is a public-domain glycan resource for bioinformatics analysis. Additionally, a glycosyltransferase mapping list of glycosyltransferases and their catalyzing glycosidic linkages was constructed. From this, we present the composite structure map (CSM), which is a structural variation map integrating its variation trees and glycosyltransferase map list. CSM is able to display, for example, expression data of glycosyltransferases in a com...
Mass spectra provide the ultimate evidence for supporting the findings of mass spectrometry (MS) ... more Mass spectra provide the ultimate evidence for supporting the findings of mass spectrometry (MS) proteomics studies in publications, and it is therefore crucial to be able to trace the conclusions back to the spectra. The Universal Spectrum Identifier (USI) provides a standardized mechanism for encoding a virtual path to any mass spectrum contained in datasets deposited to public proteomics repositories. USIs enable greater transparency for providing spectral evidence in support of key findings in publications, with more than 1 billion USI identifications from over 3 billion spectra already available through ProteomeXchange repositories.
DBTSS (Database of Transcriptional Start Sites)/DBKERO (Database of Kashiwa Encyclopedia for huma... more DBTSS (Database of Transcriptional Start Sites)/DBKERO (Database of Kashiwa Encyclopedia for human genome mutations in Regulatory regions and their Omics contexts) is the database originally initiated with the information of transcriptional start sites and their upstream transcriptional regulatory regions. In recent years, we updated the database to assist users to elucidate biological relevance of the human genome variations or somatic mutations in cancers which may affect the transcriptional regulation. In this update, we facilitate interpretations of disease associated genomic variation, using the Japanese population as a model case. We enriched the genomic variation dataset consisting of the 13,368 individuals collected for various genome-wide association studies and the reference epigenome information in the surrounding regions using a total of 455 epigenome datasets (four tissue types from 67 healthy individuals) collected for the International Human Epigenome Consortium (IHEC...
Major advancements have recently been made in mass spectrometry-based proteomics, yielding an inc... more Major advancements have recently been made in mass spectrometry-based proteomics, yielding an increasing number of datasets from various proteomics projects worldwide. In order to facilitate the sharing and reuse of promising datasets, it is important to construct appropriate, high-quality public data repositories. jPOSTrepo (https://repository.jpostdb.org/) has successfully implemented several unique features, including high-speed file uploading, flexible file management and easy-to-use interfaces. This repository has been launched as a public repository containing various proteomic datasets and is available for researchers worldwide. In addition, our repository has joined the ProteomeXchange consortium, which includes the most popular public repositories such as PRIDE in Europe for MS/MS datasets and PASSEL for SRM datasets in the USA. Later MassIVE was introduced in the USA and accepted into the ProteomeXchange, as was our repository in July 2016, providing important datasets fro...
Background: Glycoscience is a research field focusing on complex carbohydrates (otherwise known a... more Background: Glycoscience is a research field focusing on complex carbohydrates (otherwise known as glycans) a , which can, for example, serve as "switches" that toggle between different functions of a glycoprotein or glycolipid. Due to the advancement of glycomics technologies that are used to characterize glycan structures, many glycomics databases are now publicly available and provide useful information for glycoscience research. However, these databases have almost no link to other life science databases. Results: In order to implement support for the Semantic Web most efficiently for glycomics research, the developers of major glycomics databases agreed on a minimal standard for representing glycan structure and annotation information using RDF (Resource Description Framework). Moreover, all of the participants implemented this standard prototype and generated preliminary RDF versions of their data. To test the utility of the converted data, all of the data sets were uploaded into a Virtuoso triple store, and several SPARQL queries were tested as "proofs-of-concept" to illustrate the utility of the Semantic Web in querying across databases which were originally difficult to implement. Conclusions: We were able to successfully retrieve information by linking UniCarbKB, GlycomeDB and JCGGDB in a single SPARQL query to obtain our target information. We also tested queries linking UniProt with GlycoEpitope as well as lectin data with GlycomeDB through PDB. As a result, we have been able to link proteomics data with glycomics data through the implementation of Semantic Web technologies, allowing for more flexible queries across these domains.
The interaction between biological researchers and the bioinformatics tools they use is still ham... more The interaction between biological researchers and the bioinformatics tools they use is still hampered by incomplete interoperability between such tools. To ensure interoperability initiatives are effectively deployed, end-user applications need to be aware of, and support, best practices and standards. Here, we report on an initiative in which software developers and genome biologists came together to explore and raise awareness of these issues: BioHackathon 2009. Developers in attendance came from diverse backgrounds, with experts in Web services, workflow tools, text mining and visualization. Genome biologists provided expertise and exemplar data from the domains of sequence and pathway analysis and glyco-informatics. One goal of the meeting was to evaluate the ability to address real world use cases in these domains using the tools that the developers represented. This resulted in i) a workflow to annotate 100,000 sequences from an invertebrate species; ii) an integrated system ...
Proteins: Structure, Function, and Bioinformatics, 2006
Previous studies have demonstrated that endoglucanase is required for cellulose biosynthesis both... more Previous studies have demonstrated that endoglucanase is required for cellulose biosynthesis both in bacteria and plants. However, it has yet to be elucidated how the endoglucanases function in the mechanism of cellulose biosynthesis. Here we describe the crystal structure of the cellulose biosynthesis-related endo--1,4-glucanase (CM-Cax; EC 3.2.1.4) from the cellulose-producing Gramnegative bacterium, Acetobacter xylinum ؍( Gluconacetobacter xylinus), determined at 1.65-Å resolution. CMCax falls into the glycoside hydrolase family 8 (GH-8), and the structure showed that the overall fold of the CMCax is similar to those of other glycoside hydrolases belonging to GH-8. Structure comparison with Clostridium thermocellum CelA, the best characterized GH-8 endoglucanase, revealed that sugar recognition subsite ؉3 is completely missing in CMCax. The absence of the subsite ؉3 leads to significant broadness of the cleft at the cellooligosaccharide reducing-end side. CMCax is known to be a secreted enzyme and is present in the culture medium. However, electron microscopic analysis using immunostaining clearly demonstrated that a portion of CMCax is localized to the cell surface, suggesting a link with other known membrane-anchored endoglucanases that are required for cellulose biosynthesis.
TogoTable (http://togotable.dbcls.jp/) is a web tool that adds user-specified annotations to a ta... more TogoTable (http://togotable.dbcls.jp/) is a web tool that adds user-specified annotations to a table that a user uploads. Annotations are drawn from several biological databases that use the Resource Description Framework (RDF) data model. TogoTable uses database identifiers (IDs) in the table as a query key for searching. RDF data, which form a network called Linked Open Data (LOD), can be searched from SPARQL endpoints using a SPARQL query language. Because TogoTable uses RDF, it can integrate annotations from not only the reference database to which the IDs originally belong, but also externally linked databases via the LOD network. For example, annotations in the Protein Data Bank can be retrieved using GeneID through links provided by the UniProt RDF. Because RDF has been standardized by the World Wide Web Consortium, any database with annotations based on the RDF data model can be easily incorporated into this tool. We believe that TogoTable is a valuable Web tool, particularly for experimental biologists who need to process huge amounts of data such as high-throughput experimental output.
Although cellulose is the most abundant biopolymer in nature, the detailed mechanisms of cellulos... more Although cellulose is the most abundant biopolymer in nature, the detailed mechanisms of cellulose biosynthesis remain unknown. Acetobacter xylinum is one of the best-studied model organisms for cellulose biosynthesis. Interestingly, the over-expression of the cmcax gene cause enhancement of cellulose production in A. xylinum, while its product (CMCax) has cellulose degradation activity. The addition of CMCax into medium also promotes cellulose production, suggesting that CMCax is involved in cellulose synthetic pathway. In the present study, we reveal the regulation mechanism of cmcax expression in A. xylinum. First, we treated cells with four kinds of beta-glucodisaccharide. Using an enzyme assay and real-time quantitative reverse transcriptase polymerase chain reaction (qRT-PCR), we observed an increase in CMCax activity and an induction of cmcax expression by gentiobiose treatment. Therefore, we concluded that gentiobiose induced cmcax expression. Although gentiobiose does not originally exist in the cultivation medium, we have revealed that membrane and intra-cellular proteins extracted from A. xylinum produce gentiobiose from glucose, which is one of the components in the cultivation medium. Furthermore, we confirmed that cmcax expression in a wild-type strain increased gradually after 5 d cultivation using real-time qRT-PCR. These results have led us to conclude that the increase in cmcax expression after 5 d cultivation is caused by the increase in gentiobiose, which could be synthesized by a condensation reaction in A. xylinum. Since CMCax plays a pivotal role in the cellulose production system, our results will contribute to the elucidation of mechanisms of cellulose biosynthesis.
Bioinformatics approaches to carbohydrate research have recently begun using large amounts of pro... more Bioinformatics approaches to carbohydrate research have recently begun using large amounts of protein and carbohydrate data. In this field called glycome informatics, the foremost necessity is a comprehensive resource for genome-scale bioinformatics analysis of glycan data. Although the accumulation of experimental data may be useful as a reference of biological and biochemical information on carbohydrates, this is insufficient for bioinformatics analysis. Thus, we have developed a glycome informatics resource (http://www.genome.jp/ kegg/glycan/) in KEGG (Kyoto Encyclopedia of Genes and Genomes), an integrated knowledge base of protein networks, genomic information, and chemical information. This review describes three noteworthy features: (1) GLYCAN, a database of carbohydrate structures; (2) glycan-related pathways; and (3) Composite Structure Map (CSM), a map illustrating all possible variations of carbohydrate structures within organisms. GLYCAN includes two useful tools: an intuitive drawing tool called KegDraw, and an efficient glycan search and alignment tool called KEGG Carbohydrate Matcher (KCaM). KEGG's glycan biosynthesis and metabolism pathways, integrating carbohydrate structures, proteins, and reactions, are also a pivotal resource. CSM is constructed as a bridge between carbohydrate functions and structures. CSM is able to display, for example, expression data of glycosyltransferases in a compact manner. In all the KEGG resources, various objects including KEGG pathways, chemical compounds, as well as carbohydrate structures are commonly represented as graphs, which are widely studied and utilized in the computer science field.
About 14.5 kb of DNA fragments from Acetobacter xylinum ATCC23769 and ATCC53582 were cloned, and ... more About 14.5 kb of DNA fragments from Acetobacter xylinum ATCC23769 and ATCC53582 were cloned, and their nucleotide sequences were determined. The sequenced DNA regions contained endo-β-1,4glucanase, cellulose complementing protein, cellulose synthase subunit AB, C, D and β-glucosidase genes. The results from a homology search of deduced amino acid sequences between A. xylinum ATCC23769 and ATCC53582 showed that they were highly similar. However, the amount of cellulose production by ATCC53582 was 5 times larger than that of ATCC23769 during a 7-day incubation. In A. xylinum ATCC53582, synthesis of cellulose continued after glucose was consumed, suggesting that a metabolite of glucose, or a component of the medium other than glucose, may be a substrate of cellulose. On the other hand, cell growth of ATCC23769 was twice that of ATCC53582. Glucose is the energy source in A. xylinum as well as the substrate of cellulose synthesis, and the metabolic pathway of glucose in both strains may be different. These results suggest that the synthesis of cellulose and the growth of bacterial cells are contradictory.
Glycosyltransferases comprise highly divergent groups of enzymes, which play a central role in th... more Glycosyltransferases comprise highly divergent groups of enzymes, which play a central role in the synthesis of complex glycans. Because the repertoire of glycosyltransferases in the genome determines the range of synthesizable glycans, and because the increasing amount of genome sequence data is now available, it is essential to examine these enzymes across organisms to explore possible structures and functions of the glycoconjugates. In this study, we systematically investigated 36 eukaryotic genomes and obtained 3426 glycosyltransferase homologs for biosynthesis of major glycans, classified into 53 families based on sequence similarity. The families were further grouped into six functional categories based on the biosynthetic pathways, which revealed characteristic patterns among organism groups in the degree of conservation and in the number of paralogs. The results also revealed a strong correlation between the number of glycosyltransferases and the number of coding genes in each genome. We then predicted the ability to synthesize major glycan structures including N-glycan precursors and GPIanchors in each organism from the combination of the glycosyltransferase families. This indicates that not only parasitic protists but also some algae are likely to synthesize smaller structures than the structures known to be conserved among a wide range of eukaryotes. Finally we discuss the functions of two large families, sialyltransferases and b4-glycosyltransferases, by performing finer classifications into subfamilies. Our findings suggest that universality and diversity of glycans originate from two types of evolution of glycosyltransferase families, namely conserved families with few paralogs and diverged families with many paralogs.
Uploads
Papers by shin kawano