Papers by Juan Antonio Vizcaino

Proteomics, Jan 9, 2015
In a global effort for scientific transparency, it has become feasible and good practice to share... more In a global effort for scientific transparency, it has become feasible and good practice to share experimental data supporting novel findings. Consequently, the amount of publicly available mass spectrometry-based proteomics data has grown substantially in recent years. With some notable exceptions, this extensive material has however largely been left untouched. The time has now come for the proteomics community to utilize this potential gold mine for new discoveries, and uncover its untapped potential. In this review, we provide a brief history of the sharing of proteomics data, showing ways in which publicly available proteomics data are already being (re-)used, and outline potential future opportunities based on four different usage types: use, reuse, reprocess and repurpose. We thus aim to assist the proteomics community in stepping up to the challenge, and to make the most of the rapidly increasing amount of public proteomics data. This article is protected by copyright. All r...

Antimicrobial agents and chemotherapy, Jan 20, 2015
Candida infection has emerged as a critical healthcare burden worldwide, owing to the formation o... more Candida infection has emerged as a critical healthcare burden worldwide, owing to the formation of robust biofilms against common antifungals. Recent evidence shows that multidrug-tolerant persisters critically account for biofilm recalcitrance, whereas their underlying biological mechanisms are poorly understood. Here, we firstly investigated the phenotypic characteristics of Candida biofilm persisters under consecutive harsh treatments of amphotericin B. The prolonged treatments effectively killed the majority cells of biofilms derived from representative strains of Candida albicans, Candida glabrata and Candida tropicalis, but failed to eradicate a small fraction of persisters. Next, we explored the tolerance mechanisms of the persisters through investigating the proteomic profiles of C. albicans biofilm persister fractions by liquid chromatography-tandem mass spectrometry. The C. albicans biofilm persisters displayed a specific proteomic signature with an array of 205 differenti...
Journal of Proteome Research, 2015
This paper summarizes the recent activities of the Chromosome-Centric Human Proteome Project (C-H... more This paper summarizes the recent activities of the Chromosome-Centric Human Proteome Project (C-HPP) consortium, which develops new technologies to identify yet-to-be annotated proteins (termed "missing proteins") in biological samples that lack sufficient experimental evidence at the protein level for confident protein identification. The C-HPP also aims to identify new protein forms that may be caused by genetic variability, post-translational modifications, and alternative splicing. Proteogenomic data integration forms the basis of the C-HPP's activities; therefore, we have summarized some of key approaches and their roles in the project. We present new analytical technologies that improve the chemical space and lower detection limits coupled with bioinformatics tools and some publicly

Bioinformatics (Oxford, England), Jan 24, 2015
The ms-data-core-api is a free, open-source library for developing computational proteomics tools... more The ms-data-core-api is a free, open-source library for developing computational proteomics tools and pipelines. The Application Program Interface, written in Java, enables rapid tool creation by providing a robust, pluggable programming interface and common data model. The data model is based on controlled vocabularies/ontologies and captures the whole range of data types included in common proteomics experimental workflows, going from spectra to identifications to quantitative results. The library contains readers for three of the most used Proteomics Standards Initiative standard file formats: mzML, mzIdentML, and mzTab. In addition to mzML, it also supports other common mass spectra formats: dta, ms2, mgf, pkl, apl (text-based), mzXML and mzData (XML-based). Also, it can be used to read PRIDE XML, the original format used by the PRIDE database, one of the world-leading proteomics resources. Finally, we present a set of algorithms and tools whose implementation illustrates the si...

Journal of the American Medical Informatics Association : JAMIA, Jan 28, 2015
To describe the goals of the Proteomics Standards Initiative (PSI) of the Human Proteome Organiza... more To describe the goals of the Proteomics Standards Initiative (PSI) of the Human Proteome Organization, the methods that the PSI has employed to create data standards, the resulting output of the PSI, lessons learned from the PSI's evolution, and future directions and synergies for the group. The PSI has 5 categories of deliverables that have guided the group. These are minimum information guidelines, data formats, controlled vocabularies, resources and software tools, and dissemination activities. These deliverables are produced via the leadership and working group organization of the initiative, driven by frequent workshops and ongoing communication within the working groups. Official standards are subjected to a rigorous document process that includes several levels of peer review prior to release. We have produced and published minimum information guidelines describing what information should be provided when making data public, either via public repositories or other means. ...
The HUPO Proteomics Standards Initiative has developed several standardized data formats to facil... more The HUPO Proteomics Standards Initiative has developed several standardized data formats to facilitate data sharing in mass spectrometry (MS)-based proteomics. These allow researchers to report their complete results in a unified way. However, at present, there is no format to describe the final qualitative and quantitative results for proteomics and metabolomics experiments in a simple tabular format. Many downstream analysis use cases are only concerned with the final results of an experiment and require an easily accessible format, compatible with tools such as Microsoft Excel or R.

Data Mining in Proteomics: From Standards to Applications, 2011
The Proteomics Identifications Database (PRIDE, http://www.ebi.ac.uk/pride ) provides users with ... more The Proteomics Identifications Database (PRIDE, http://www.ebi.ac.uk/pride ) provides users with the ability to explore and compare mass spectrometry-based proteomics experiments that reveal details of the protein expression found in a broad range of taxonomic groups, tissues, and disease states. A PRIDE experiment typically includes identifications of proteins, peptides, and protein modifications. Additionally, many of the submitted experiments also include the mass spectra that provide the evidence for these identifications. Finally, one of the strongest advantages of PRIDE in comparison with other proteomics repositories is the amount of metadata it contains, a key point to put the above-mentioned data in biological and/or technical context. Several informatics tools have been developed in support of the PRIDE database. The most recent one is called "Database on Demand" (DoD), which allows custom sequence databases to be built in order to optimize the results from search engines. We describe the use of DoD in this chapter. Additionally, in order to show the potential of PRIDE as a source for data mining, we also explore complex queries using federated BioMart queries to integrate PRIDE data with other resources, such as Ensembl, Reactome, or UniProt.

Current Protocols in Protein Science, 2001
The Proteomics Identifications database (PRIDE, http://www.ebi.ac.uk/pride) is one of the main re... more The Proteomics Identifications database (PRIDE, http://www.ebi.ac.uk/pride) is one of the main repositories designed to store, disseminate, and analyze mass spectrometry-based proteomics datasets. In this unit, an overview of the PRIDE system is given, including its key satellite tools: the Ontology Lookup Service (OLS), the Protein Identifier Cross-Referencing Service (PICR), and Database on Demand (DoD). Also described in detail are procedures for submitting data to PRIDE, and accessing data stored in PRIDE using the BioMart interface. Finally, to demonstrate the potential of PRIDE as a source for data mining, an example protocol is provided to showcase the powerful cross-domain query capabilities available through a combination of BioMarts.

PLoS Computational Biology, 2014
Over the past several years fungal infections have shown an increasing incidence in the susceptib... more Over the past several years fungal infections have shown an increasing incidence in the susceptible population, and caused high mortality rates. In parallel, multi-resistant fungi are emerging in human infections. Therefore, the identification of new potential antifungal targets is a priority. The first task of this study was to analyse the protein domain and domain architecture content of the 137 fungal proteomes (corresponding to 111 species) available in UniProtKB (UniProt KnowledgeBase) by January 2013. The resulting list of core and exclusive domain and domain architectures is provided in this paper. It delineates the different levels of fungal taxonomic classification: phylum, subphylum, order, genus and species. The analysis highlighted Aspergillus as the most diverse genus in terms of exclusive domain content. In addition, we also investigated which domains could be considered promiscuous in the different organisms. As an application of this analysis, we explored three different ways to detect potential targets for antifungal drugs. First, we compared the domain and domain architecture content of the human and fungal proteomes, and identified those domains and domain architectures only present in fungi. Secondly, we looked for information regarding fungal pathways in public repositories, where proteins containing promiscuous domains could be involved. Three pathways were identified as a result: lovastatin biosynthesis, xylan degradation and biosynthesis of siroheme. Finally, we classified a subset of the studied fungi in five groups depending on their occurrence in clinical samples. We then looked for exclusive domains in the groups that were more relevant clinically and determined which of them had the potential to bind small molecules. Overall, this study provides a comprehensive analysis of the available fungal proteomes and shows three approaches that can be used as a first step in the detection of new antifungal targets. Citation: Barrera A, Alastruey-Izquierdo A, Martín MJ, Cuesta I, Vizcaíno JA (2014) Analysis of the Protein Domain and Domain Architecture Content in Fungi and

Molecular & Cellular Proteomics, 2014
Quality control is increasingly recognized as a crucial aspect of mass spectrometry based proteom... more Quality control is increasingly recognized as a crucial aspect of mass spectrometry based proteomics. Several recent papers discuss relevant parameters for quality control and present applications to extract these from the instrumental raw data. What has been missing, however, is a standard data exchange format for reporting these performance metrics. We therefore developed the qcML format, an XML-based standard that follows the design principles of the related mzML, mzIdentML, mzQuantML, and TraML standards from the HUPO-PSI (Proteomics Standards Initiative). In addition to the XML format, we also provide tools for the calculation of a wide range of quality metrics as well as a database format and interconversion tools, so that existing LIMS systems can easily add relational storage of the quality control data to their existing schema. We here describe the qcML specification, along with possible use cases and an illustrative example of the subsequent analysis possibilities. All information about qcML is available at http://code.google.com/p/qcml.
Biochimica et biophysica acta, 2013
This paper focuses on the use of controlled vocabularies (CVs) and ontologies especially in the a... more This paper focuses on the use of controlled vocabularies (CVs) and ontologies especially in the area of proteomics, primarily related to the work of the Proteomics Standards Initiative (PSI). It describes the relevant proteomics standard formats and the ontologies used within them. Software and tools for working with these ontology files are also discussed. The article also examines the "mapping files" used to ensure correct controlled vocabulary terms that are placed within PSI standards and the fulfillment of the MIAPE (Minimum Information about a Proteomics Experiment) requirements.

Database : the journal of biological databases and curation, 2013
Controlled vocabularies (CVs), i.e. a collection of predefined terms describing a modeling domain... more Controlled vocabularies (CVs), i.e. a collection of predefined terms describing a modeling domain, used for the semantic annotation of data, and ontologies are used in structured data formats and databases to avoid inconsistencies in annotation, to have a unique (and preferably short) accession number and to give researchers and computer algorithms the possibility for more expressive semantic annotation of data. The Human Proteome Organization (HUPO)-Proteomics Standards Initiative (PSI) makes extensive use of ontologies/CVs in their data formats. The PSI-Mass Spectrometry (MS) CV contains all the terms used in the PSI MS-related data standards. The CV contains a logical hierarchical structure to ensure ease of maintenance and the development of software that makes use of complex semantics. The CV contains terms required for a complete description of an MS analysis pipeline used in proteomics, including sample labeling, digestion enzymes, instrumentation parts and parameters, software used for identification and quantification of peptides/proteins and the parameters and scores used to determine their significance. Owing to the range of topics covered by the CV, collaborative development across several PSI working groups, including proteomics research groups, instrument manufacturers and software vendors, was necessary. In this article, we describe the overall structure of the CV, the process by which it has been developed and is maintained and the dependencies on other ontologies.
BMC Microbiology, 2009
Background: It has recently been shown that the Trichoderma fungal species used for biocontrol of... more Background: It has recently been shown that the Trichoderma fungal species used for biocontrol of plant diseases are capable of interacting with plant roots directly, behaving as symbiotic microorganisms. With a view to providing further information at transcriptomic level about the early response of Trichoderma to a host plant, we developed a high-density oligonucleotide (HDO) microarray encompassing 14,081 Expressed Sequence Tag (EST)-based transcripts from eight Trichoderma spp. and 9,121 genome-derived transcripts of T. reesei, and we have used this microarray to examine the gene expression of T. harzianum either alone or in the presence of tomato plants, chitin, or glucose.

Fungal Genetics and Biology, 2007
In the present article, we describe the cloning and characterization of the Trichoderma harzianum... more In the present article, we describe the cloning and characterization of the Trichoderma harzianum hmgR gene encoding a hydroxymethylglutaryl CoA reductase (HMGR), a key enzyme in the biosynthesis of terpene compounds. In T. harzianum, partial silencing of the hmgR gene gave rise to transformants with a higher level of sensitivity to lovastatin, a competitive inhibitor of the HMGR enzyme. In addition, these hmgR-silenced transformants produced lower levels of ergosterol than the wild-type strain in a minimal medium containing lovastatin. The silenced transformants showed a decrease in hmgR gene expression (up to a 8.4-fold, after 72 h of incubation), together with an increase in the expression of erg7 (up to a 15.8-fold, after 72 h of incubation), a gene involved in the biosynthesis of triterpenes. Finally, hmgR-silenced transformants showed a reduction in their antifungal activity against the plant-pathogen fungi Rhizoctonia solani and Fusarium oxysporum.

Mycological Research, 2005
Methanol extracts from 24 Trichoderma isolates, selected as biocontrol agents and representating ... more Methanol extracts from 24 Trichoderma isolates, selected as biocontrol agents and representating different species and genotypes from three of the four taxonomic sections of this genus (T. sect. Trichoderma, T. sect. Pachybasium and T. sect. Longibrachiatum) were screened for antibacterial, anti-yeast and antifungal activities against a panel of seven bacteria, seven yeasts and six filamentous fungi previously used in similar studies. Two different growth media were tested (potato dextrose broth and CYS80), and all isolates included in the antimicrobial tests showed at least one inhibitory activity against one of the target microorganisms in one of the two culture media. No statistically significant differences were detected in the number of active strains between the two culture media, but the highest number of inhibitory strains against bacteria and fungi were found in strains from Trichoderma sect. Pachybasium, whereas strains from T. sect. Longibrachiatum showed the highest anti-yeast values. In all cases, a correlation was found between the strains that were active against yeasts and fungi. However, some degree of variability was detected for strains within the same taxonomic section. In general terms, strains from T. asperellum (mainly in CYS80 medium), and T. longibrachiatum gave the best non-enzymatic antimicrobial profiles.
Applied Microbiology and Biotechnology, 2007

Nature Biotechnology, 2014
There is a growing trend towards public dissemination of proteomics data, which is facilitating t... more There is a growing trend towards public dissemination of proteomics data, which is facilitating the assessment, reuse, comparative analyses and extraction of new findings from published data 1, 2 . This process has been mainly driven by journal publication guidelines and funding agencies. However, there is a need for better integration of public repositories and coordinated sharing of all the pieces of information needed to represent a full mass spectrometry (MS)-based proteomics experiment. Your July 2009 editorial "Credit where credit is overdue" 3 exposed the situation in the proteomics field, where full data disclosure is still not common practise. Olsen and Mann 4 identified different levels of information in the typical experiment, starting from raw data and going through peptide identification and quantification, protein identifications and ratios and the resulting biological conclusions. All of these levels should be captured and properly annotated in public databases, using the existing MS proteomics repositories for the MS data (raw data, identification and quantification results) and metadata, whereas the resulting biological information should be integrated in protein knowledgebases, such as UniProt 5 . A recent editorial in Nature Methods 6 again highlighted the need for a stable repository for raw MS proteomics data. In this Correspondence, we report on the first implementation of the ProteomeXchange consortium, an integrated framework for submission and dissemination of MS-based proteomics data.
Journal of Proteomics, 2010
Despite the fact that data deposition is not a generalised fact yet in the field of proteomics, s... more Despite the fact that data deposition is not a generalised fact yet in the field of proteomics, several mass spectrometry (MS) based proteomics repositories are publicly available for the scientific community.
Uploads
Papers by Juan Antonio Vizcaino