Papers by Yasset Perez-Riverol
bigbio/jmzTab: Release 3.0.12
Version of jmzTab library

Accuracy vs. feature selection combination for expression datasets (1, 3, 4, 5, 6 and 7)
<p>(<b>RF</b>) Random Forest without previous feature selection step; (<b>... more <p>(<b>RF</b>) Random Forest without previous feature selection step; (<b>X2-CM-RFE-RF</b>), random forest classification after the feature selection step using univariate correlation filter with matrix correlation and recursive feature elimination; (<b>X2-PCA-RFE-RF</b>), random forest classification after the feature selection step using univariate correlation filter with principal component analysis and recursive feature elimination. All methods include an internal cross-validation 10-fold step. All accuracy metrics were estimated following the approach previously reported by <i>Pochet et al</i>. [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0189875#pone.0189875.ref031" target="_blank">31</a>], where 20-fold randomized test data were used to summarize the accuracy of the FS combination.</p
GA4GH/TOOL-REGISTRY-SERVICE-SCHEMAS: 2.0.1-BETA.0
for public discussion includes service-info, filter by descriptor language transition to openapi ... more for public discussion includes service-info, filter by descriptor language transition to openapi 3.0 (which introduces some formatting changes) only as opposed to converted from swagger by swagger2openapi behind the scenes , transition to new doc system, add galaxy
Systematic Integration of Millions of Peptidoforms Evidences into ENSEMBL Genome Browser
Systems biology
ms-data-core-api: an open-source, metadata-oriented library for computational proteomics

CHAPTER 19. Cross-platform Software Development and Distribution with Bioconda and BioContainers
Bioinformatics software development has become a cornerstone in modern biology research. Large-sc... more Bioinformatics software development has become a cornerstone in modern biology research. Large-scale quantitative biology studies have created a demand for more complex workflows and data analysis pipelines. Challenges in reproducing bioinformatics analyses are compounded by the fact that the programs themselves are difficult to install on computers because they rely on software libraries, compilers, and other files, and environment variables collectively called dependencies that are assumed to be available and, thus, are often poorly documented. The Bioconda and BioContainers community have created a complete ecosystem that allow bioinformatics software to be installed and executed under an isolated and controlled environment. Also, it provides infrastructure and basic guidelines to create and distribute bioinformatics containers with a special focus on omics technologies. These cross-platform containers can be integrated into more comprehensive bioinformatics pipelines and differe...

ArXiv, 2020
Cuban science and technology are known for important achievements, particularly in human healthca... more Cuban science and technology are known for important achievements, particularly in human healthcare and biotechnology. During the second half of XX century, the country developed a system of scientific institutions to address and solve major economical, cultural, social and health problems. However, the economic crisis faced by the island during the last three decades has had a major impact in Cuban scientific research. In addition to decreased investment, the emigration of thousands of young as well as senior scientists to other countries have had a major impact in Cuban research output. To date, no systematic analysis regarding scientific publications, citations, or patents granted to Cuban authors during this period, are available. Here, an analysis of Cuban scientific production since 1970, with an especial focus on the last three decades (1990 - 2019), is provided. All national metrics are compared with other countries, emphasizing those from Latin America. Preliminary results ...
PRIDE Inspector Toolsuite: Moving towards a quality assessment tool for proteomics data standards and ProteomeXchange repositories
1BioContainers is an open-source project that aims to create, store, and distribute bioinformatic... more 1BioContainers is an open-source project that aims to create, store, and distribute bioinformatics software containers and packages. The BioContainers community has developed a set of guidelines to standardize the software containers including the metadata, versions, licenses, and/or software dependencies. BioContainers supports multiple packaging and containers technologies such as Conda, Docker, and Singularity. Here, we introduce the BioContainers Registry and Restful API to make containerized bioinformatics tools more findable, accessible, interoperable, and reusable (FAIR). BioContainers registry provides a fast and convenient way to find and retrieve bioinformatics tools packages and containers. By doing so, it will increase the use of bioinformatics packages and containers while promoting replicability and reproducibility in research.

Congenital Heart Disease (CHD) affects approximately 7-9 children per 1000 live births. Numerous ... more Congenital Heart Disease (CHD) affects approximately 7-9 children per 1000 live births. Numerous genetic studies have established a role for rare genomic variants at the copy number variation (CNV) and single nucleotide variant level. In particular, the role of de novo mutations (DNM) has been highlighted in syndromic and non-syndromic CHD. To identify novel haploinsufficient CHD disease genes we performed an integrative analysis of CNVs and DNMs identified in probands with CHD including cases with sporadic thoracic aortic aneurysm (TAA). We assembled CNV data from 7,958 cases and 14,082 controls and performed a gene-wise analysis of the burden of rare genomic deletions in cases versus controls. In addition, we performed mutation rate testing for DNMs identified in 2,489 parent-offspring trios. Our combined analysis revealed 21 genes which were significantly affected by rare genomic deletions and/or constrained non-synonymous de novo mutations in probands. Fourteen of these genes ha...
Author response for "The European Bioinformatics Community for Mass Spectrometry (EuBIC‐MS): an open community for bioinformatics training and research

Journal of Proteome Research, 2019
Label-free quantification has become a common-practice in many mass spectrometry-based proteomics... more Label-free quantification has become a common-practice in many mass spectrometry-based proteomics experiments. In recent years, we and others have shown that spectral clustering can considerably improve the analysis of (primarily large-scale) proteomics data sets. Here we show that spectral clustering can be used to infer additional peptidespectrum matches and improve the quality of label-free quantitative proteomics data in data sets also containing only tens of MS runs. We analyzed four well-known public benchmark data sets that represent different experimental settings using spectral counting and peak intensity based label-free quantification. In both approaches, the additionally inferred peptide-spectrum matches through our spectra-cluster algorithm improved the detectability of low abundant proteins while increasing the accuracy of the derived quantitative data, without increasing the data sets' noise. Additionally, we developed a Proteome Discoverer node for our spectra-cluster algorithm which allows anyone to rebuild our proposed pipeline using the free version of Proteome Discoverer.

Nucleic Acids Research, 2012
The PRoteomics IDEntifications (PRIDE, http:// www.ebi.ac.uk/pride) database at the European Bioi... more The PRoteomics IDEntifications (PRIDE, http:// www.ebi.ac.uk/pride) database at the European Bioinformatics Institute is one of the most prominent data repositories of mass spectrometry (MS)based proteomics data. Here, we summarize recent developments in the PRIDE database and related tools. First, we provide up-to-date statistics in data content, splitting the figures by groups of organisms and species, including peptide and protein identifications, and post-translational modifications. We then describe the tools that are part of the PRIDE submission pipeline, especially the recently developed PRIDE Converter 2 (new submission tool) and PRIDE Inspector (visualization and analysis tool). We also give an update about the integration of PRIDE with other MS proteomics resources in the context of the ProteomeXchange consortium. Finally, we briefly review the quality control efforts that are ongoing at present and outline our future plans.

Database, 2013
The Java BioWareHouse (JBioWH) project is an open-source platform-independent programming framewo... more The Java BioWareHouse (JBioWH) project is an open-source platform-independent programming framework that allows a user to build his/her own integrated database from the most popular data sources. JBioWH can be used for intensive querying of multiple data sources and the creation of streamlined task-specific data sets on local PCs. JBioWH is based on a MySQL relational database scheme and includes JAVA API parser functions for retrieving data from 20 public databases (e.g. NCBI, KEGG, etc.). It also includes a client desktop application for (non-programmer) users to query data. In addition, JBioWH can be tailored for use in specific circumstances, including the handling of massive queries for high-throughput analyses or CPU intensive calculations. The framework is provided with complete documentation and application examples and it can be downloaded from the Project Web site at http://code.google.com/p/jbiowh. A MySQL server is available for demonstration purposes at hydrax.icgeb.trieste.it:3307.
Bioinformatics, 2013
Protein identification by mass spectrometry is commonly accomplished using a peptide sequence mat... more Protein identification by mass spectrometry is commonly accomplished using a peptide sequence matching search algorithm, whose sensitivity varies inversely with the size of the sequence database and the number of post-translational modifications considered. We present the Spectrum Identification Machine, a peptide sequence matching tool that capitalizes on the high-intensity b1-fragment ion of tandem mass spectra of peptides coupled in solution with phenylisotiocyanate to confidently sequence the first amino acid and ultimately reduce the search space. We demonstrate that in complex search spaces, a gain of some 120% in sensitivity can be achieved.
Metaproteomics – the characterization of proteins expressed by microbiomes – presents a range of ... more Metaproteomics – the characterization of proteins expressed by microbiomes – presents a range of technical challenges, from sampling to data processing and interpretation. In the iPRG 2020 study, we investigated the status of metaproteomics data analysis workflows by posing questions to the metaproteomics studies in two studies. In two phases of the study, the participants were asked to deduce the organisms or taxa in a metaproteomics sample ("What species are represented in the sample?") and what biological phenomena have taken place ("What interactions took place between the species in the mixture?"). The outputs from these studies will be presented at the RG session at ABRF 2021.

Motivation: Omics Discovery Index (OmicsDI-www.omicsdi.org) is an integrated and open-source plat... more Motivation: Omics Discovery Index (OmicsDI-www.omicsdi.org) is an integrated and open-source platform to facilitate the discovery and dissemination of omics datasets metadata. It provides a unique infrastructure to integrate datasets coming from multiple omics studies, including at present proteomics, genomics, transcriptomics, metabolomics, and systems biology. The OmicsDI architecture was originally implemented and deployed in a dedicated high-performance computing cluster, limiting scalability and dynamic allocation of resources by the data processing pipelines. In addition, the original OmicsDI resource could not be reused by independent laboratories and research groups to share and disseminate their data. Results: Here, we present a new version of OmicsDI that can be easily deployed in cloud architectures and local infrastructures enabling the development of a Federated OmicsDI. The new architecture can be automatically synchronized with the main OmicsDI resource, increasing the integration with other omics data providers. Also, the proposed Cloud-based architecture is more scalable, providing better capabilities to manage the increase of data providers and datasets.
Open source libraries and frameworks for biological data visualisation: A guide for developers
Proteomics, Feb 5, 2015
To the Editor: Your editorial “Credit where credit is overdue”1 aptly summarized the existing sit... more To the Editor: Your editorial “Credit where credit is overdue”1 aptly summarized the existing situation in the proteomics field, where full data disclosure remains very much a work in progress. Importantly, it also correctly pointed out that ‘the software provided by the public repositories for searching and analysing proteomics data is not as efficient and as user friendly as it could be’. We therefore here introduce PRIDE Inspector
Uploads
Papers by Yasset Perez-Riverol