Using a coarse-grained bead-spring model of bacterial chromosomes of Caulobacter crescentus and E... more Using a coarse-grained bead-spring model of bacterial chromosomes of Caulobacter crescentus and Escherichia coli, we show that just 33 and 38 effective cross-links in 4017 and 4642 monomer chains at special positions along the chain contour can lead to the large-scale organization of the DNA polymer, where confinement effects of the cell walls play a key role in the organization. The positions of the 33/38 cross-links along the chain contour are chosen from the Hi-C contact map of bacteria C. crescentus and E. coli. We represent 1000 base pairs as a coarse-grained monomer in our bead-spring flexible ring polymer model of the DNA polymer. Thus, 4017/4642 beads on a flexible ring polymer represent the C. crescentus/E. coli DNA polymer with 4017/4642 kilo-base pairs. Choosing suitable parameters from Paper I, we also incorporate the role of compaction of the polymer coil due to the presence of molecular crowders and the ability of the chain to release topological constraints. We validate our prediction of the organization of the bacterial chromosomes with available experimental data and also give a prediction of the approximate positions of different segments within the cell. In the absence of confinement, the minimal number of effective cross-links required to organize the DNA chains of 4017/4642 monomers was 60/82 [
We showed in our previous studies that just 3% cross-links, at special points along the contour o... more We showed in our previous studies that just 3% cross-links, at special points along the contour of the bacterial DNA help the DNA-polymer to get organized at micron length scales [1, 2]. In this work, we investigate how does the release of topological constraints help in the organization of the DNA-polymer. Furthermore, we show that the chain compaction induced by the crowded environment in the bacterial cytoplasm contributes to the organization of the DNA-polymer. We model the DNA chain as a flexible bead-spring ring polymer, where each bead represents 1000 base pairs. The specific positions of the cross-links have been taken from the experimental contact maps of the bacteria C. crescentus and E. coli. We introduce different extents of topological constraints in our model by systematically changing the diameter of the monomer bead. It varies from the value where the chain crossing can occur freely to the value where the chain crossing is disallowed. We also study the role of molecular crowders by introducing an effective Lennard Jones attraction between the monomers. Using Monte-Carlo simulations, we show that the release of topological constraints and the crowding environment play a crucial role to obtain a unique organization of the polymer.
Journal of physics. Condensed matter : an Institute of Physics journal, Jan 24, 2018
Using data from contact maps of the DNA-polymer of Escherichia coli (E. Coli) (at kilobase pair r... more Using data from contact maps of the DNA-polymer of Escherichia coli (E. Coli) (at kilobase pair resolution) as an input to our model, we introduce cross-links between monomers in a bead-spring model of a ring polymer at very specific points along the chain. Via suitable Monte Carlo simulations, we show that the presence of these cross-links leads to a particular organization of the chain at large (micron) length scales of the DNA. We also investigate the structure of a ring polymer with an equal number of cross-links at random positions along the chain. We find that though the polymer does get organized at the large length scales, the nature of the organization is quite different from the organization observed with cross-links at specific biologically determined positions. We used the contact map of E. Coli bacteria which has around 4.6 million base pairs in a single circular chromosome. In our coarse-grained flexible ring polymer model, we used 4642 monomer beads and observed that ...
In the fruitfly Drosophila melanogaster, the differential development of wing and haltere is depe... more In the fruitfly Drosophila melanogaster, the differential development of wing and haltere is dependent on the function of the Hox protein Ultrabithorax (Ubx). Here we compare Ubx-mediated regulation of wing patterning genes between the honeybee, Apis mellifera, the silkmoth, Bombyx mori and Drosophila. Orthologues of Ubx are expressed in the third thoracic segment of Apis and Bombyx, although they make functional hindwings. When over-expressed in transgenic Drosophila, Ubx derived from Apis or Bombyx could suppress wing development, suggesting evolutionary changes at the level of co-factors and/or targets of Ubx. To gain further insights into such events, we identified direct targets of Ubx from Apis and Bombyx by ChIP-seq and compared them with those of Drosophila. While majority of the putative targets of Ubx are species-specific, a considerable number of wing-patterning genes are retained, over the past 300 millions years, as targets in all the three species. Interestingly, many of these are differentially expressed only between wing and haltere in Drosophila but not between forewing and hindwing in Apis or Bombyx. Detailed bioinformatics and experimental validation of enhancer sequences suggest that, perhaps along with other factors, changes in the cis-regulatory sequences of earlier targets contribute to diversity in Ubx function.
Potato Homeobox 15 (POTH15) is a KNOX-I (Knotted1-like homeobox) family gene in potato that is or... more Potato Homeobox 15 (POTH15) is a KNOX-I (Knotted1-like homeobox) family gene in potato that is orthologous to Shoot Meristemless (STM) in Arabidopsis. Despite numerous reports on KNOX genes from different species, studies in potato are limited. Here, we describe photoperiodic regulation of POTH15, its overexpression phenotype, and identification of its potential targets in potato (Solanum tuberosum ssp. andigena). qRT-PCR analysis showed a higher abundance of POTH15 mRNA in shoot tips and stolons under tuber-inducing short-day conditions. POTH15 promoter activity was detected in apical and axillary meristems, stolon tips, tuber eyes, and meristems of tuber sprouts, indicating its role in meristem maintenance and leaf development. POTH15 overexpression altered multiple morphological traits including leaf and stem development, leaflet number, and number of nodes and branches. In particular, the rachis of the leaf was completely reduced and leaves appeared as a bouquet of leaflets. Com...
Diverse data sets have become key building blocks of translational biomedical research. Data type... more Diverse data sets have become key building blocks of translational biomedical research. Data types captured and referenced by sophisticated research studies include high throughput genomic and proteomic data, laboratory data, data from imagery, and outcome data. In this paper, the authors present the application of an XML-based data management system to support integration of data from disparate data sources and large data sets. This system facilitates management of XML schemas and on-demand creation and management of XML databases that conform to these schemas. They illustrate the use of this system in an application for genotype–phenotype correlation analyses. This application implements a method of phenotype–genotype correlation based on phylogenetic optimization of large data sets of mouse SNPs and phenotypic data. The application workflow requires the management and integration of genomic information and phenotypic data from external data repositories and from the results of phenotype–genotype correlation analyses. Our implementation supports the process of carrying out a complex workflow that includes large-scale phylogenetic tree optimizations and application of Maddison's concentrated changes test to large phylogenetic tree data sets. The data management system also allows collaborators to share data in a uniform way and supports complex queries that target data sets.
Hox proteins are transcription factors and key regulators of segmental identity along the anterio... more Hox proteins are transcription factors and key regulators of segmental identity along the anterior posterior axis across all bilaterian animals. Despite decades of research, the mechanisms by which Hox proteins select and regulate their targets remain elusive. We have carried out whole-genome ChIP-chip experiments to identify direct targets of Hox protein Ultrabithorax (Ubx) during haltere development in Drosophila. Direct targets identified include upstream regulators or cofactors of Ubx. Homothorax, a cofactor of Ubx during embryonic development, is one such target and is required for normal specification of haltere. Although Ubx bound sequences are conserved amongst various insect genomes, no consensus Ubx-specific motif was detected. Surprisingly, binding motifs for certain transcription factors that function either upstream or downstream to Ubx are enriched in these sequences suggesting complex regulatory loops governing Ubx function. Our data supports the hypothesis that specificity during Hox target selection is achieved by associating with other transcription factors.
Gene duplication, expansion, and subsequent diversification are features of the evolutionary proc... more Gene duplication, expansion, and subsequent diversification are features of the evolutionary process. Duplicated genes can be lost, modified, or altered to generate novel functions over evolutionary timescales. These features make gene duplication a powerful engine of evolutionary change. In this study, we explore these features in the MADF-BESS family of transcriptional regulators. In Drosophila melanogaster, the family contains 16 similar members, each containing an N-terminal, DNA-binding MADF domain and a C-terminal, protein-interacting, BESS domain. Phylogenetic analysis shows that members of the MADF-BESS family are expanded in the Drosophila lineage. Three members, which we name hinge1, hinge2, and hinge3 are required for wing development, with a critical role in the wing hinge. hinge1 is a negative regulator of Winglesss expression and interacts with core wing-hinge patterning genes such as teashirt, homothorax, and jing. Double knockdowns along with heterologous rescue experiments are used to demonstrate that members of the MADF-BESS family retain function in the wing hinge, in spite of expansion and diversification for over 40 million years. The wing hinge connects the blade to the thorax and has critical roles in fluttering during flight. MADF-BESS family genes appear to retain redundant functions to shape and form elements of the wing hinge in a robust and fail-safe manner.
The Asian elephant Elephas maximus and the African elephant Loxodonta africana that diverged 5-7 ... more The Asian elephant Elephas maximus and the African elephant Loxodonta africana that diverged 5-7 million years ago exhibit differences in their physiology, behaviour and morphology. A comparative genomics approach would be useful and necessary for evolutionary and functional genetic studies of elephants. We performed sequencing of E. maximus and map to L. africana at ~15X coverage. Through comparative sequence analyses, we have identified Asian elephant specific homozygous, non-synonymous single nucleotide variants (SNVs) that map to 1514 protein coding genes, many of which are involved in olfaction. We also present the first report of a high-coverage transcriptome sequence in E. maximus from peripheral blood lymphocytes. We have identified 103 novel protein coding transcripts and 66-long non-coding (lnc)RNAs. We also report the presence of 181 protein domains unique to elephants when compared to other Afrotheria species. Each of these findings can be further investigated to gain a better understanding of functional differences unique to elephant species, as well as those unique to elephantids in comparison with other mammals. This work therefore provides a valuable resource to explore the immense research potential of comparative analyses of transcriptome and genome sequences in the Asian elephant.
Histones are abundant nuclear proteins that are essential for the packaging of eukaryotic DNA int... more Histones are abundant nuclear proteins that are essential for the packaging of eukaryotic DNA into chromosomes. Different histone variants, in combination with their modification 'code', control regulation of gene expression in diverse cellular processes. Several enzymes that catalyze the addition and removal of multiple histone modifications have been discovered in the past decade, enabling investigations of their role(s) in normal cellular processes and diverse pathological conditions. This sudden influx of data, however, has resulted in need of an updated knowledgebase that compiles, organizes and presents curated scientific information to the user in an easily accessible format. Here, we present HIstome, a browsable, manually curated, relational database that provides information about human histone proteins, their sites of modifications, variants and modifying enzymes. HIstome is a knowledgebase of 55 human histone proteins, 106 distinct sites of their post-translational modifications (PTMs) and 152 histone-modifying enzymes. Entries have been grouped into 5 types of histones, 8 types of post-translational modifications and 14 types of enzymes that catalyze addition and removal of these modifications. The resource will be useful for epigeneticists, pharmacologists and clinicians. HIstome: The Histone Infobase is available online at http://www.iiserpune.ac.in/∼coee/histome/ and http://www.actrec.gov.in/histome/.
Some predict that influenza A H5N1 will be the cause of a pandemic among humans. In preparation f... more Some predict that influenza A H5N1 will be the cause of a pandemic among humans. In preparation for such an event, many governments and organizations have stockpiled antiviral drugs such as oseltamivir (Tamiflu). However, it is known that multiple lineages of H5N1 are already resistant to another class of drugs, adamantane derivatives, and a few lineages are resistant to oseltamivir. What is less well understood is the evolutionary history of the mutations that confer drug resistance in the H5N1 population. In order to address this gap, we conducted phylogenetic analyses of 676 genomic sequences of H5N1 and used the resulting hypotheses as a basis for asking 3 molecular evolutionary questions: (1) Have drug-resistant genotypes arisen in distinct lineages of H5N1 through point mutation or through reassortment? (2) Is there evidence for positive selection on the codons that lead to drug resistance? (3) Is there evidence for covariation between positions in the genome that confer resistance to drugs and other positions, unrelated to drug resistance, that may be under selection for other phenotypes? We also examine how drug-resistant lineages proliferate across the landscape by projecting or phylogenetic analysis onto a virtual globe. Our results for H5N1 show that in most cases drug resistance has arisen by independent point mutations rather than reassortment or covariation. Furthermore, we found that some codons that mediate resistance to adamantane derivatives are under positive selection, but did not find positive selection on codons that mediate resistance to oseltamivir. Together, our phylogenetic methods, molecular evolutionary analyses, and geographic visualization provide a framework for analysis of globally distributed genomic data that can be used to monitor the evolution of drug resistance.
Severe acute respiratory syndrome (SARS) is a novel human illness caused by a previously unrecogn... more Severe acute respiratory syndrome (SARS) is a novel human illness caused by a previously unrecognized coronavirus (CoV) termed SARS-CoV. There are conflicting reports on the animal reservoir of SARS-CoV. Many of the groups that argue carnivores are the original reservoir of SARS-CoV use a phylogeny to support their argument. However, the phylogenies in these studies often lack outgroup and rooting criteria necessary to determine the origins of SARS-CoV. Recently, SARS-CoV has been isolated from various species of Chiroptera from China (e.g., Rhinolophus sinicus) thus leading to reconsideration of the original reservoir of SARS-CoV. We evaluated the hypothesis that SARS-CoV isolated from Chiroptera are the original zoonotic source for SARS-CoV by sampling SARS-CoV and non-SARS-CoV from diverse hosts including Chiroptera, carnivores, artiodactyls and humans. Regardless of alignment parameters, optimality criteria, or isolate sampling, the resulting phylogenies clearly show that the SARS-CoV was transmitted to small carnivores well after the epidemic of SARS in humans that began in late 2002. The SARS-CoV isolates from small carnivores in Shenzhen markets form a terminal clade that emerged recently from within the radiation of human SARS-CoV. There is evidence of subsequent exchange of SARS-CoV between humans and carnivores. In addition SARS-CoV was transmitted independently from humans to farmed pigs (Sus scrofa). The position of SARS-CoV isolates from Chiroptera are basal to the SARS-CoV clade isolated from humans and carnivores. Although sequence data indicate that Chiroptera are a good candidate for the original reservoir of SARS-CoV, the structural biology of the spike protein of SARS-CoV isolated from Chiroptera suggests that these viruses are not able to interact with the human variant of the receptor of SARS-CoV, angiotensin-converting enzyme 2 (ACE2). In SARS-CoV study, both visually and statistically, labile genomic fragments and, putative key mutations of the spike protein that may be associated with host shifts. We display host shifts and candidate mutations on trees projected in virtual globes depicting the spread of SARS-CoV. These results suggest that more sampling of coronaviruses from diverse hosts, especially Chiroptera, carnivores and primates, will be required to understand the genomic and biochemical evolution of coronaviruses, including SARS-CoV.
We provide two methods for identifying changes in genotype that are correlated with changes in a ... more We provide two methods for identifying changes in genotype that are correlated with changes in a phenotype implied by phylogenetic trees. The first method, VENN, works when the number of branches over which the change occurred are modest. VENN looks for genetic changes that are completely penetrant with phenotype changes on a tree. The second method, CCTSWEEP, allows for a partial matching between changes in phenotypes and genotypes and provides a score for each change using Maddison's concentrated changes test. The mutations that are highly correlated with phenotypic change can be ranked by score. We use these methods to find SNPs correlated with resistance to Bacillus anthracis in inbred mouse strains. Our findings are consistent with the current biological literature, and also suggest potential novel candidate genes. Contact: [email protected] for software requests.
A b s t r a c t Diverse data sets have become key building blocks of translational biomedical res... more A b s t r a c t Diverse data sets have become key building blocks of translational biomedical research. Data types captured and referenced by sophisticated research studies include high throughput genomic and proteomic data, laboratory data, data from imagery, and outcome data. In this paper, the authors present the application of an XMLbased data management system to support integration of data from disparate data sources and large data sets. This system facilitates management of XML schemas and on-demand creation and management of XML databases that conform to these schemas. They illustrate the use of this system in an application for genotype-phenotype correlation analyses. This application implements a method of phenotype-genotype correlation based on phylogenetic optimization of large data sets of mouse SNPs and phenotypic data. The application workflow requires the management and integration of genomic information and phenotypic data from external data repositories and from the results of phenotype-genotype correlation analyses. Our implementation supports the process of carrying out a complex workflow that includes large-scale phylogenetic tree optimizations and application of Maddison's concentrated changes test to large phylogenetic tree data sets. The data management system also allows collaborators to share data in a uniform way and supports complex queries that target data sets.
Novel pathogens have the potential to become critical issues of national security, public health ... more Novel pathogens have the potential to become critical issues of national security, public health and economic welfare. As demonstrated by the response to Severe Acute Respiratory Syndrome (SARS) and influenza, genomic sequencing has become an important method for diagnosing agents of infectious disease. Despite the value of genomic sequences in characterizing novel pathogens, raw data on their own do not provide the information needed by public health officials and researchers. One must integrate knowledge of the genomes of pathogens with host biology and geography to understand the etiology of epidemics. To these ends, we have created an application called Supramap (http://supramap.osu.edu) to put information on the spread of pathogens and key mutations across time, space and various hosts into a geographic information system (GIS). To build this application, we created a web service for integrated sequence alignment and phylogenetic analysis as well as methods to describe the tree, mutations, and host shifts in Keyhole Markup Language (KML). We apply the application to 239 sequences of the polymerase basic 2 (PB2) gene of recent isolates of avian influenza (H5N1). We map a mutation, glutamic acid to lysine at position 627 in the PB2 protein (E627K), in H5N1 influenza that allows for increased replication of the virus in mammals. We use a statistical test to support the hypothesis of a correlation of E627K mutations with avian-mammalian host shifts but reject the hypothesis that lineages with E627K are moving westward. Data, instructions for use, and visualizations are included as supplemental materials at:
Unzipping force analysis of protein association is a technique to investigate protein-DNA interac... more Unzipping force analysis of protein association is a technique to investigate protein-DNA interactions by mechanically unzipping DNA. We computationally investigate the limits of this technique under quasistatic conditions. We find the minimum binding energy of a protein for which the protein can be detected using this technique and the minimum distance between the binding sites of two proteins of varying binding energies that can be resolved unambiguously with this technique.
Bulletin of the American Physical Society, Jan 1, 2005
Single nucleotide polymorphisms or SNPs are DNA sequence variations among genomes of a population... more Single nucleotide polymorphisms or SNPs are DNA sequence variations among genomes of a population or other closely related group. While many SNPs have no effect on cell functions other SNPs predispose an organism to disease and or influence its response to a drug. ...
Recent years have seen an exponential growth in publicly available genetic data for many organism... more Recent years have seen an exponential growth in publicly available genetic data for many organisms. To be scientifically or medically useful, the genetic data must be mapped to the physical traits that the genes in the genotype code. In this dissertation, we describe methods to find correlations between genotypes and phenotypes using phylogenetic trees that can be applied on a genome-wide scale. We first de-
Using a coarse-grained bead-spring model of bacterial chromosomes of Caulobacter crescentus and E... more Using a coarse-grained bead-spring model of bacterial chromosomes of Caulobacter crescentus and Escherichia coli, we show that just 33 and 38 effective cross-links in 4017 and 4642 monomer chains at special positions along the chain contour can lead to the large-scale organization of the DNA polymer, where confinement effects of the cell walls play a key role in the organization. The positions of the 33/38 cross-links along the chain contour are chosen from the Hi-C contact map of bacteria C. crescentus and E. coli. We represent 1000 base pairs as a coarse-grained monomer in our bead-spring flexible ring polymer model of the DNA polymer. Thus, 4017/4642 beads on a flexible ring polymer represent the C. crescentus/E. coli DNA polymer with 4017/4642 kilo-base pairs. Choosing suitable parameters from Paper I, we also incorporate the role of compaction of the polymer coil due to the presence of molecular crowders and the ability of the chain to release topological constraints. We validate our prediction of the organization of the bacterial chromosomes with available experimental data and also give a prediction of the approximate positions of different segments within the cell. In the absence of confinement, the minimal number of effective cross-links required to organize the DNA chains of 4017/4642 monomers was 60/82 [
We showed in our previous studies that just 3% cross-links, at special points along the contour o... more We showed in our previous studies that just 3% cross-links, at special points along the contour of the bacterial DNA help the DNA-polymer to get organized at micron length scales [1, 2]. In this work, we investigate how does the release of topological constraints help in the organization of the DNA-polymer. Furthermore, we show that the chain compaction induced by the crowded environment in the bacterial cytoplasm contributes to the organization of the DNA-polymer. We model the DNA chain as a flexible bead-spring ring polymer, where each bead represents 1000 base pairs. The specific positions of the cross-links have been taken from the experimental contact maps of the bacteria C. crescentus and E. coli. We introduce different extents of topological constraints in our model by systematically changing the diameter of the monomer bead. It varies from the value where the chain crossing can occur freely to the value where the chain crossing is disallowed. We also study the role of molecular crowders by introducing an effective Lennard Jones attraction between the monomers. Using Monte-Carlo simulations, we show that the release of topological constraints and the crowding environment play a crucial role to obtain a unique organization of the polymer.
Journal of physics. Condensed matter : an Institute of Physics journal, Jan 24, 2018
Using data from contact maps of the DNA-polymer of Escherichia coli (E. Coli) (at kilobase pair r... more Using data from contact maps of the DNA-polymer of Escherichia coli (E. Coli) (at kilobase pair resolution) as an input to our model, we introduce cross-links between monomers in a bead-spring model of a ring polymer at very specific points along the chain. Via suitable Monte Carlo simulations, we show that the presence of these cross-links leads to a particular organization of the chain at large (micron) length scales of the DNA. We also investigate the structure of a ring polymer with an equal number of cross-links at random positions along the chain. We find that though the polymer does get organized at the large length scales, the nature of the organization is quite different from the organization observed with cross-links at specific biologically determined positions. We used the contact map of E. Coli bacteria which has around 4.6 million base pairs in a single circular chromosome. In our coarse-grained flexible ring polymer model, we used 4642 monomer beads and observed that ...
In the fruitfly Drosophila melanogaster, the differential development of wing and haltere is depe... more In the fruitfly Drosophila melanogaster, the differential development of wing and haltere is dependent on the function of the Hox protein Ultrabithorax (Ubx). Here we compare Ubx-mediated regulation of wing patterning genes between the honeybee, Apis mellifera, the silkmoth, Bombyx mori and Drosophila. Orthologues of Ubx are expressed in the third thoracic segment of Apis and Bombyx, although they make functional hindwings. When over-expressed in transgenic Drosophila, Ubx derived from Apis or Bombyx could suppress wing development, suggesting evolutionary changes at the level of co-factors and/or targets of Ubx. To gain further insights into such events, we identified direct targets of Ubx from Apis and Bombyx by ChIP-seq and compared them with those of Drosophila. While majority of the putative targets of Ubx are species-specific, a considerable number of wing-patterning genes are retained, over the past 300 millions years, as targets in all the three species. Interestingly, many of these are differentially expressed only between wing and haltere in Drosophila but not between forewing and hindwing in Apis or Bombyx. Detailed bioinformatics and experimental validation of enhancer sequences suggest that, perhaps along with other factors, changes in the cis-regulatory sequences of earlier targets contribute to diversity in Ubx function.
Potato Homeobox 15 (POTH15) is a KNOX-I (Knotted1-like homeobox) family gene in potato that is or... more Potato Homeobox 15 (POTH15) is a KNOX-I (Knotted1-like homeobox) family gene in potato that is orthologous to Shoot Meristemless (STM) in Arabidopsis. Despite numerous reports on KNOX genes from different species, studies in potato are limited. Here, we describe photoperiodic regulation of POTH15, its overexpression phenotype, and identification of its potential targets in potato (Solanum tuberosum ssp. andigena). qRT-PCR analysis showed a higher abundance of POTH15 mRNA in shoot tips and stolons under tuber-inducing short-day conditions. POTH15 promoter activity was detected in apical and axillary meristems, stolon tips, tuber eyes, and meristems of tuber sprouts, indicating its role in meristem maintenance and leaf development. POTH15 overexpression altered multiple morphological traits including leaf and stem development, leaflet number, and number of nodes and branches. In particular, the rachis of the leaf was completely reduced and leaves appeared as a bouquet of leaflets. Com...
Diverse data sets have become key building blocks of translational biomedical research. Data type... more Diverse data sets have become key building blocks of translational biomedical research. Data types captured and referenced by sophisticated research studies include high throughput genomic and proteomic data, laboratory data, data from imagery, and outcome data. In this paper, the authors present the application of an XML-based data management system to support integration of data from disparate data sources and large data sets. This system facilitates management of XML schemas and on-demand creation and management of XML databases that conform to these schemas. They illustrate the use of this system in an application for genotype–phenotype correlation analyses. This application implements a method of phenotype–genotype correlation based on phylogenetic optimization of large data sets of mouse SNPs and phenotypic data. The application workflow requires the management and integration of genomic information and phenotypic data from external data repositories and from the results of phenotype–genotype correlation analyses. Our implementation supports the process of carrying out a complex workflow that includes large-scale phylogenetic tree optimizations and application of Maddison's concentrated changes test to large phylogenetic tree data sets. The data management system also allows collaborators to share data in a uniform way and supports complex queries that target data sets.
Hox proteins are transcription factors and key regulators of segmental identity along the anterio... more Hox proteins are transcription factors and key regulators of segmental identity along the anterior posterior axis across all bilaterian animals. Despite decades of research, the mechanisms by which Hox proteins select and regulate their targets remain elusive. We have carried out whole-genome ChIP-chip experiments to identify direct targets of Hox protein Ultrabithorax (Ubx) during haltere development in Drosophila. Direct targets identified include upstream regulators or cofactors of Ubx. Homothorax, a cofactor of Ubx during embryonic development, is one such target and is required for normal specification of haltere. Although Ubx bound sequences are conserved amongst various insect genomes, no consensus Ubx-specific motif was detected. Surprisingly, binding motifs for certain transcription factors that function either upstream or downstream to Ubx are enriched in these sequences suggesting complex regulatory loops governing Ubx function. Our data supports the hypothesis that specificity during Hox target selection is achieved by associating with other transcription factors.
Gene duplication, expansion, and subsequent diversification are features of the evolutionary proc... more Gene duplication, expansion, and subsequent diversification are features of the evolutionary process. Duplicated genes can be lost, modified, or altered to generate novel functions over evolutionary timescales. These features make gene duplication a powerful engine of evolutionary change. In this study, we explore these features in the MADF-BESS family of transcriptional regulators. In Drosophila melanogaster, the family contains 16 similar members, each containing an N-terminal, DNA-binding MADF domain and a C-terminal, protein-interacting, BESS domain. Phylogenetic analysis shows that members of the MADF-BESS family are expanded in the Drosophila lineage. Three members, which we name hinge1, hinge2, and hinge3 are required for wing development, with a critical role in the wing hinge. hinge1 is a negative regulator of Winglesss expression and interacts with core wing-hinge patterning genes such as teashirt, homothorax, and jing. Double knockdowns along with heterologous rescue experiments are used to demonstrate that members of the MADF-BESS family retain function in the wing hinge, in spite of expansion and diversification for over 40 million years. The wing hinge connects the blade to the thorax and has critical roles in fluttering during flight. MADF-BESS family genes appear to retain redundant functions to shape and form elements of the wing hinge in a robust and fail-safe manner.
The Asian elephant Elephas maximus and the African elephant Loxodonta africana that diverged 5-7 ... more The Asian elephant Elephas maximus and the African elephant Loxodonta africana that diverged 5-7 million years ago exhibit differences in their physiology, behaviour and morphology. A comparative genomics approach would be useful and necessary for evolutionary and functional genetic studies of elephants. We performed sequencing of E. maximus and map to L. africana at ~15X coverage. Through comparative sequence analyses, we have identified Asian elephant specific homozygous, non-synonymous single nucleotide variants (SNVs) that map to 1514 protein coding genes, many of which are involved in olfaction. We also present the first report of a high-coverage transcriptome sequence in E. maximus from peripheral blood lymphocytes. We have identified 103 novel protein coding transcripts and 66-long non-coding (lnc)RNAs. We also report the presence of 181 protein domains unique to elephants when compared to other Afrotheria species. Each of these findings can be further investigated to gain a better understanding of functional differences unique to elephant species, as well as those unique to elephantids in comparison with other mammals. This work therefore provides a valuable resource to explore the immense research potential of comparative analyses of transcriptome and genome sequences in the Asian elephant.
Histones are abundant nuclear proteins that are essential for the packaging of eukaryotic DNA int... more Histones are abundant nuclear proteins that are essential for the packaging of eukaryotic DNA into chromosomes. Different histone variants, in combination with their modification 'code', control regulation of gene expression in diverse cellular processes. Several enzymes that catalyze the addition and removal of multiple histone modifications have been discovered in the past decade, enabling investigations of their role(s) in normal cellular processes and diverse pathological conditions. This sudden influx of data, however, has resulted in need of an updated knowledgebase that compiles, organizes and presents curated scientific information to the user in an easily accessible format. Here, we present HIstome, a browsable, manually curated, relational database that provides information about human histone proteins, their sites of modifications, variants and modifying enzymes. HIstome is a knowledgebase of 55 human histone proteins, 106 distinct sites of their post-translational modifications (PTMs) and 152 histone-modifying enzymes. Entries have been grouped into 5 types of histones, 8 types of post-translational modifications and 14 types of enzymes that catalyze addition and removal of these modifications. The resource will be useful for epigeneticists, pharmacologists and clinicians. HIstome: The Histone Infobase is available online at http://www.iiserpune.ac.in/∼coee/histome/ and http://www.actrec.gov.in/histome/.
Some predict that influenza A H5N1 will be the cause of a pandemic among humans. In preparation f... more Some predict that influenza A H5N1 will be the cause of a pandemic among humans. In preparation for such an event, many governments and organizations have stockpiled antiviral drugs such as oseltamivir (Tamiflu). However, it is known that multiple lineages of H5N1 are already resistant to another class of drugs, adamantane derivatives, and a few lineages are resistant to oseltamivir. What is less well understood is the evolutionary history of the mutations that confer drug resistance in the H5N1 population. In order to address this gap, we conducted phylogenetic analyses of 676 genomic sequences of H5N1 and used the resulting hypotheses as a basis for asking 3 molecular evolutionary questions: (1) Have drug-resistant genotypes arisen in distinct lineages of H5N1 through point mutation or through reassortment? (2) Is there evidence for positive selection on the codons that lead to drug resistance? (3) Is there evidence for covariation between positions in the genome that confer resistance to drugs and other positions, unrelated to drug resistance, that may be under selection for other phenotypes? We also examine how drug-resistant lineages proliferate across the landscape by projecting or phylogenetic analysis onto a virtual globe. Our results for H5N1 show that in most cases drug resistance has arisen by independent point mutations rather than reassortment or covariation. Furthermore, we found that some codons that mediate resistance to adamantane derivatives are under positive selection, but did not find positive selection on codons that mediate resistance to oseltamivir. Together, our phylogenetic methods, molecular evolutionary analyses, and geographic visualization provide a framework for analysis of globally distributed genomic data that can be used to monitor the evolution of drug resistance.
Severe acute respiratory syndrome (SARS) is a novel human illness caused by a previously unrecogn... more Severe acute respiratory syndrome (SARS) is a novel human illness caused by a previously unrecognized coronavirus (CoV) termed SARS-CoV. There are conflicting reports on the animal reservoir of SARS-CoV. Many of the groups that argue carnivores are the original reservoir of SARS-CoV use a phylogeny to support their argument. However, the phylogenies in these studies often lack outgroup and rooting criteria necessary to determine the origins of SARS-CoV. Recently, SARS-CoV has been isolated from various species of Chiroptera from China (e.g., Rhinolophus sinicus) thus leading to reconsideration of the original reservoir of SARS-CoV. We evaluated the hypothesis that SARS-CoV isolated from Chiroptera are the original zoonotic source for SARS-CoV by sampling SARS-CoV and non-SARS-CoV from diverse hosts including Chiroptera, carnivores, artiodactyls and humans. Regardless of alignment parameters, optimality criteria, or isolate sampling, the resulting phylogenies clearly show that the SARS-CoV was transmitted to small carnivores well after the epidemic of SARS in humans that began in late 2002. The SARS-CoV isolates from small carnivores in Shenzhen markets form a terminal clade that emerged recently from within the radiation of human SARS-CoV. There is evidence of subsequent exchange of SARS-CoV between humans and carnivores. In addition SARS-CoV was transmitted independently from humans to farmed pigs (Sus scrofa). The position of SARS-CoV isolates from Chiroptera are basal to the SARS-CoV clade isolated from humans and carnivores. Although sequence data indicate that Chiroptera are a good candidate for the original reservoir of SARS-CoV, the structural biology of the spike protein of SARS-CoV isolated from Chiroptera suggests that these viruses are not able to interact with the human variant of the receptor of SARS-CoV, angiotensin-converting enzyme 2 (ACE2). In SARS-CoV study, both visually and statistically, labile genomic fragments and, putative key mutations of the spike protein that may be associated with host shifts. We display host shifts and candidate mutations on trees projected in virtual globes depicting the spread of SARS-CoV. These results suggest that more sampling of coronaviruses from diverse hosts, especially Chiroptera, carnivores and primates, will be required to understand the genomic and biochemical evolution of coronaviruses, including SARS-CoV.
We provide two methods for identifying changes in genotype that are correlated with changes in a ... more We provide two methods for identifying changes in genotype that are correlated with changes in a phenotype implied by phylogenetic trees. The first method, VENN, works when the number of branches over which the change occurred are modest. VENN looks for genetic changes that are completely penetrant with phenotype changes on a tree. The second method, CCTSWEEP, allows for a partial matching between changes in phenotypes and genotypes and provides a score for each change using Maddison's concentrated changes test. The mutations that are highly correlated with phenotypic change can be ranked by score. We use these methods to find SNPs correlated with resistance to Bacillus anthracis in inbred mouse strains. Our findings are consistent with the current biological literature, and also suggest potential novel candidate genes. Contact: [email protected] for software requests.
A b s t r a c t Diverse data sets have become key building blocks of translational biomedical res... more A b s t r a c t Diverse data sets have become key building blocks of translational biomedical research. Data types captured and referenced by sophisticated research studies include high throughput genomic and proteomic data, laboratory data, data from imagery, and outcome data. In this paper, the authors present the application of an XMLbased data management system to support integration of data from disparate data sources and large data sets. This system facilitates management of XML schemas and on-demand creation and management of XML databases that conform to these schemas. They illustrate the use of this system in an application for genotype-phenotype correlation analyses. This application implements a method of phenotype-genotype correlation based on phylogenetic optimization of large data sets of mouse SNPs and phenotypic data. The application workflow requires the management and integration of genomic information and phenotypic data from external data repositories and from the results of phenotype-genotype correlation analyses. Our implementation supports the process of carrying out a complex workflow that includes large-scale phylogenetic tree optimizations and application of Maddison's concentrated changes test to large phylogenetic tree data sets. The data management system also allows collaborators to share data in a uniform way and supports complex queries that target data sets.
Novel pathogens have the potential to become critical issues of national security, public health ... more Novel pathogens have the potential to become critical issues of national security, public health and economic welfare. As demonstrated by the response to Severe Acute Respiratory Syndrome (SARS) and influenza, genomic sequencing has become an important method for diagnosing agents of infectious disease. Despite the value of genomic sequences in characterizing novel pathogens, raw data on their own do not provide the information needed by public health officials and researchers. One must integrate knowledge of the genomes of pathogens with host biology and geography to understand the etiology of epidemics. To these ends, we have created an application called Supramap (http://supramap.osu.edu) to put information on the spread of pathogens and key mutations across time, space and various hosts into a geographic information system (GIS). To build this application, we created a web service for integrated sequence alignment and phylogenetic analysis as well as methods to describe the tree, mutations, and host shifts in Keyhole Markup Language (KML). We apply the application to 239 sequences of the polymerase basic 2 (PB2) gene of recent isolates of avian influenza (H5N1). We map a mutation, glutamic acid to lysine at position 627 in the PB2 protein (E627K), in H5N1 influenza that allows for increased replication of the virus in mammals. We use a statistical test to support the hypothesis of a correlation of E627K mutations with avian-mammalian host shifts but reject the hypothesis that lineages with E627K are moving westward. Data, instructions for use, and visualizations are included as supplemental materials at:
Unzipping force analysis of protein association is a technique to investigate protein-DNA interac... more Unzipping force analysis of protein association is a technique to investigate protein-DNA interactions by mechanically unzipping DNA. We computationally investigate the limits of this technique under quasistatic conditions. We find the minimum binding energy of a protein for which the protein can be detected using this technique and the minimum distance between the binding sites of two proteins of varying binding energies that can be resolved unambiguously with this technique.
Bulletin of the American Physical Society, Jan 1, 2005
Single nucleotide polymorphisms or SNPs are DNA sequence variations among genomes of a population... more Single nucleotide polymorphisms or SNPs are DNA sequence variations among genomes of a population or other closely related group. While many SNPs have no effect on cell functions other SNPs predispose an organism to disease and or influence its response to a drug. ...
Recent years have seen an exponential growth in publicly available genetic data for many organism... more Recent years have seen an exponential growth in publicly available genetic data for many organisms. To be scientifically or medically useful, the genetic data must be mapped to the physical traits that the genes in the genotype code. In this dissertation, we describe methods to find correlations between genotypes and phenotypes using phylogenetic trees that can be applied on a genome-wide scale. We first de-
Uploads
Papers by Farhat Habib