Papers by Patricia Rodriguez-Tome
A Survey on Integrating Data in Bioinformatics
Studies in Computational Intelligence, 2011
Data integration is an open challenge in bioinformatics. Querying and retrieving data from remote... more Data integration is an open challenge in bioinformatics. Querying and retrieving data from remote and/or local sources and analyzing them are very time consuming tasks for biologists. Data integration allows biologists to combine knowledge from multiple ...
Realizzazione di una piattaforma per studi di equivalenza farmaceutica e bioequivalenza mediante modello bioinformatico di correlazione in vitro-in vivo
Retrovirology, Sep 1, 2013

BMC Bioinformatics, 2014
Background: In recent years, the experimental aspects of the laboratory activities have been grow... more Background: In recent years, the experimental aspects of the laboratory activities have been growing in complexity in terms of amount and diversity of data produced, equipment used, of computer-based workflows needed to process and analyze the raw data generated. To enhance the level of quality control over the laboratory activities and efficiently handle the large amounts of data produced, a Laboratory Management Information System (LIMS) is highly-recommended. A LIMS is a complex software platform that helps researchers to have a complete knowledge of the laboratory activities at each step encouraging them to adopt good laboratory practices. Results: We have designed and implemented Quality and TRacEability Data System-QTREDS, a software platform born to address the specific needs of the CRS4 Sequencing and Genotyping Platform (CSGP). The system written in the Ruby programming language and developed using the Rails framework is based on four main functional blocks: a sample handler, a workflow generator, an inventory management system and a user management system. The wizard-based sample handler allows to manage one or multiple samples at a time, tracking the path of each sample and providing a full chain of custody. The workflow generator encapsulates a user-friendly JavaScript-based visual tool that allows users to design customized workflows even for those without a technical background. With the inventory management system, reagents, laboratory glassware and consumables can be easily added through their barcodes and minimum stock levels can be controlled to avoid shortages of essential laboratory supplies. QTREDS provides a system for privileges management and authorizations to create different user roles, each with a well-defined access profile. Conclusions: Tracking and monitoring all the phases of the laboratory activities can help to identify and troubleshoot problems more quickly, reducing the risk of process failures and their related costs. QTREDS was designed to address the specific needs of the CSGP laboratory, where it has been successfully used for over a year, but thanks to its flexibility it can be easily adapted to other "omics" laboratories. The software is freely available for academic users from http://qtreds.crs4.it.
IDENTIFICATION AND BIOINFORMATICS CHARACTERIZATION OF 98 HERV-K(HML-2) CONTAINING PROVIRUSES IN THE HUMAN GENOME ASSEMBLY GRCh37/hg19

Springer eBooks, 1999
Data collections are distributed at many different sites and stored in numerous different databas... more Data collections are distributed at many different sites and stored in numerous different database management systems. The industry standard CORBA can help to alleviate the technical problems of distribution and diverging data formats. In a CORBA environment, data structures can be represented using the Interface Definition Language IDL. Manually coding a server, which implements the IDL through calls to the underlying database, is tedious. On the other hand, it is in general impossible to automatically generate the CORBA server because the IDL is not only determined by the schema of the database but also by other factors such as performance requirements. We therefore have developed a method for the semi-automatic generation of CORBA wrappers for relational databases. A declarative language is presented, which is used to describe the mapping between relations and IDL constructs. Using a set of such mapping rules, a CORBA server is generated together with the IDL. Additionally, the server is equipped with a query language based on the IDL. We have implemented a prototype of the system.
Bioinformatics, 1998
Motivation: There are a large number of genetic and physical maps, distributed at many sites. Eac... more Motivation: There are a large number of genetic and physical maps, distributed at many sites. Each site offers different kinds of access methods and viewers. CORBA, the de facto standard for distributed object-oriented computing, offers new opportunities to unify the view on these maps through standard interfaces. A collaboration of Infobiogen and the EBI proposes a common IDL for maps. Results: A CORBA map viewer is presented which serves as a proof of concept for the proposed IDL. It demonstrates its usefulness in the context of map viewing and its ability to handle large maps with >1000 markers. The viewer gives access to the maps of the Radiation Hybrid Database at EBI. It gives a quick overview of several large maps side by side. The marker density at each map position is displayed and different marker types can be highlighted.

Retrovirology, Jan 22, 2016
Background: Human endogenous retroviruses (HERVs) represent the inheritance of ancient germ-line ... more Background: Human endogenous retroviruses (HERVs) represent the inheritance of ancient germ-line cell infections by exogenous retroviruses and the subsequent transmission of the integrated proviruses to the descendants. ERVs have the same internal structure as exogenous retroviruses. While no replication-competent HERVs have been recognized, some retain up to three of four intact ORFs. HERVs have been classified before, with varying scope and depth, notably in the RepBase/RepeatMasker system. However, existing classifications are bewildering. There is a need for a systematic, unifying and simple classification. We strived for a classification which is traceable to previous classifications and which encompasses HERV variation within a limited number of clades. Results: The human genome assembly GRCh 37/hg19 was analyzed with RetroTector, which primarily detects relatively complete Class I and II proviruses. A total of 3173 HERV sequences were identified. The structure of and relations between these proviruses was resolved through a multi-step classification procedure that involved a novel type of similarity image analysis ("Simage") which allowed discrimination of heterogeneous (noncanonical) from homogeneous (canonical) HERVs. Of the 3173 HERVs, 1214 were canonical and segregated into 39 canonical clades (groups), belonging to class I (Gamma-and Epsilon-like), II (Beta-like) and III (Spuma-like). The groups were chosen based on (1) sequence (nucleotide and Pol amino acid), similarity, (2) degree of fit to previously published clades, often from RepBase, and (3) taxonomic markers. The groups fell into 11 supergroups. The 1959 noncanonical HERVs contained 31 additional, less well-defined groups. Simage analysis revealed several types of mosaicism, notably recombination and secondary integration. By comparing flanking sequences, LTRs and completeness of gene structure, we deduced that some noncanonical HERVs proliferated after the recombination event. Groups were further divided into envelope subgroups (altogether 94) based on sequence similarity and characteristic "immunosuppressive domain" motifs. Intra and inter(super)group, as well as intraclass, recombination involving envelope genes ("env snatching") was a common event. LTR divergence indicated that HERV-K(HML2) and HERVFC had the most recent integrations, HERVL and HUERSP3 the oldest. Conclusions: A comprehensive HERV classification and characterization approach was undertaken. It should be applicable for classification of all ERVs. Recombination was common among HERV ancestors.
Human Endogenous Retrovirus type W distribution in the human genome: identification and characterization of a new set of proviral sequences in GRCh37/hg19 assembly
Analysis of 98 HERV-K(HML-2) Containing Proviruses Identified in the Human Genome Assembly GRCH37/HG19 by Retrotector and Their Genomic Context
Journal of Cheminformatics, May 1, 2010
Human Endogenous Retrovirus type W characterization in genome assembly GRCh37/hg19: an innovative approach for human diseases investigation

Retrovirology, 2016
Question: Foamy viruses (FV), and in particular PFV, have emerged in recent years as attractive g... more Question: Foamy viruses (FV), and in particular PFV, have emerged in recent years as attractive gene therapy vector candidates. Since the lack of knowledge on molecular events in FV replication is a major hurdle for broader usage of foamy virus vectors, we aimed at elucidating PFV biology by investigating interactions of its capsid protein, Gag, with host cell components. Methodology and result: To this end, we identified members of the mammalian PLK family as PFV Gag interactants in a commercial yeasttwo-hybrid (Y2H) screen and validated these results in detailed Y2H experiments for PLK1-3. In the yeast system, the intact PLK kinase and substrate recognition motifs were required for interactions with PFV Gag, in which a unique S 224-T-P 226 motif served as a PLK binding determinant. PFV Gag mutants harbouring alanine substitutions of STP residues (iSTP) or phosphomimetic mutations of the T 225 (pmSTP) failed to interact with PLK1-3 in yeast. These findings were corroborated by colocalization studies of ectopically expressed, fluorescently tagged proteins in mammalian cells, where mCherry-tagged PFV Gag was able to recruit eGFP-tagged PLK1 and 2 to condensed mitotic chromatin in an STP motif-dependent manner. When characterizing PFV virions containing wild type or STP mutant Gag proteins, we observed that the mutations did not interfere with particle assembly, release or reverse transcription, but led to a 70 % titer reduction relative to wild type in single-round infection experiments. These replication defects became more prominent in the replication-competent PFV context. Therefore, the lack of Gag STP mutant interaction with PLK proteins upon viral entry into host cells was likely underlying this replication deficit. This hypothesis was strengthened by the finding that enzymatic PLK inhibition in host cells during transduction with wild type PFV mimicked the replication phenotype of PFV STP mutants. In addition to the overall reduced infectivity of the mutants, we also observed that the STP mutations in particle-associated Gag lead to differential sensitivity to integrase inhibition by dolutegravir and resulted in decreased integration efficiency. Conclusions: Taken together, our results demonstrate that PLK proteins influence PFV replication by virtue of their interaction with the Gag protein, ensuring timely and efficient transduction. O2 A novel entry/uncoating assay reveals the presence of at least two species of viral capsids during synchronized HIV-1 infection

A radiation hybrid transcript map of the mouse genome
Nature genetics, 2001
Expressed-sequence tag (EST) maps are an adjunct to sequence-based analytical methods of gene det... more Expressed-sequence tag (EST) maps are an adjunct to sequence-based analytical methods of gene detection and localization for those species for which such data are available, and provide anchors for high-density homology and orthology mapping in species for which large-scale sequencing has yet to be done. Species for which radiation hybrid-based transcript maps have been established include human, rat, mouse, dog, cat and zebrafish. We have established a comprehensive first-generation-placement radiation hybrid map of the mouse consisting of 5,904 mapped markers (3,993 ESTs and 1,911 sequence-tagged sites (STSs)). The mapped ESTs, which often originate from small-EST clusters, are enriched for genes expressed during early mouse embryogenesis and are probably different from those localized in humans. We have confirmed by in situ hybridization that even singleton ESTs, which are usually not retained for mapping studies, may represent bona fide transcribed sequences. Our studies on mous...

A radiation hybrid map of mouse genes
Nature Genetics, 2001
A comprehensive gene-based map of a genome is a powerful tool for genetic studies and is especial... more A comprehensive gene-based map of a genome is a powerful tool for genetic studies and is especially useful for the positional cloning and positional candidate approaches. The availability of gene maps for multiple organisms provides the foundation for detailed conserved-orthology maps showing the correspondence between conserved genomic segments. These maps make it possible to use cross-species information in gene hunts and shed light on the evolutionary forces that shape the genome. Here we report a radiation hybrid map of mouse genes, a combined project of the Whitehead Institute/Massachusetts Institute of Technology Center for Genome Research, the Medical Research Council UK Mouse Genome Centre, and the National Center for Biotechnology Information. The map contains 11,109 genes, screened against the T31 RH panel and positioned relative to a reference map containing 2,280 mouse genetic markers. It includes 3,658 genes homologous to the human genome sequence and provides a framework for overlaying the human genome sequence to the mouse and for sequencing the mouse genome.
Science, 1998
, as demonstrated by the stability of the 100-ns simulation at 300 K started from the native NMR ... more , as demonstrated by the stability of the 100-ns simulation at 300 K started from the native NMR structure. But accurate inclusion of long-range electrostatic effects [U. Essmann et al., J. Chem. Phys. 103, 8577 (1995)] does provide an additional challenge for achieving high parallelism in the MD code. Work is in progress to include long-range electrostatic effects while still achieving a level of parallelism and speed comparable to that presented here.

Nature Precedings, 2010
In the last decade, the development and use of new methods in combinatorial chemistry and high-th... more In the last decade, the development and use of new methods in combinatorial chemistry and high-throughput screening has dramatically increased the number of known biologically active compounds. Paradoxically, the number of drugs reaching the market has not followed the same trend, often because many of the candidate drugs present poor qualities in absorption, distribution, metabolism, excretion, and toxicological properties (ADME-Tox). The ability to recognize and discard bad candidates early in the drug discovery steps would save lost investments in time and money. Machine learning techniques could provide solutions to this problem. The goal of my research is to develop classifiers that accurately discriminate between active and inactive molecules for a specific target. To this end, I am comparing the effectiveness of the application of different machine learning techniques to this problem. As a source of data we have selected a set of PubChem's public BioAssays¹. In addition, with the objective of realizing a real-time query service with our predictors, we aim to keep the features describing the chemical compounds relatively simple. At the end of this process, we should better understand how to build statistical models that are able to recognize molecules active in a specific bioassay, including how to select the most appropriate classification technique, and how to describe compounds in such a way that is not excessively resource-consuming to generate, yet contains sufficient information for the classification. We see immediate applications of such technology to recognize compounds with high-risk of toxicity, and also to suggest likely metabolic pathways that would process it.
Uploads
Papers by Patricia Rodriguez-Tome