Tag: GRAF-pop

dbGaP: Data and analyses from millions of study participants, samples, and trillions of genotypes!

dbGaP: Data and analyses from millions of study participants, samples, and trillions of genotypes!

Are you familiar with the well-known Framingham Heart Study, a multi-generation study of residents of Framingham, Massachusetts begun in 1948? Much of what is now known about the impact of genetics, lifestyle, and diet on cardiovascular health and disease has come from this research study. (See PMC4159698  for a historical perspective.) Did you know that data from this study and over 2,000 other studies that demonstrate the relationship between genetic and medical outcomes and other phenotypes are available from NCBI’s Database of Genotypes and Phenotypes (dbGaP)?

dbGaP was established in 2007 as a repository of human data from large scale studies. You can access data from more than 2.8 million study participants who have provided over 3.3 million molecular samples. You can retrieve patient-level phenotypic (e.g., demographic, clinical, exposure) data and molecular (e.g., called genotypes omics, sequence) data, and the results of association analyses from genome-scale case-control and longitudinal studies of heritable diseases.

What types of studies and data are available in dbGaP?

dbGaP contains a wide range of studies and types of data, all relating to human genetic and phenotypic measurements. Most dbGaP data are from NIH-funded research, but recently we have expanded to include non-NIH funded studies. An easy way to find dbGaP Studies, Phenotype and Molecular Datasets, Variables, Analyses and Documents is through the dbGaP Advanced Search (Figure 1). The interface allows you to filter results by different characteristics depending on the tab you choose.

Figure 1. The dbGaP Advanced Search interface. Tabs that appear at the top of the web interface allow you to select the studies, datasets, analyses, etc. of interest. Filters (facets) appear on the left (see inset). Click on filters to select values to find Links on the study summary pages provide direct access to data. Top panel:  Studies tab and the corresponding filter categories.  Bottom panel: Molecular data tab results with Study (Framingham SHARe), Markerset Source (Affymetrix) filters applied. 

Continue reading “dbGaP: Data and analyses from millions of study participants, samples, and trillions of genotypes!”

GRAF, a tool for finding duplicates and closely related samples in large genomic datasets

NCBI’s Genetic Relationship and Fingerprinting (GRAF) tool is a quality assurance tool that can quickly find duplicates and closely related subjects in your data using SNP genotypes.

The population tool GRAF-pop included in GRAF computes subject ancestries using genotypes and normalizes ancestry prediction in large datasets collected across different genotyping platforms, making it possible to generate population frequency based on more than a million dbGaP samples.

Who can use this?

GRAF is a tool for researchers; it is not designed to assess an individual’s ancestry or to find relatives.

You can use this tool against your own large datasets with results generated within hours or minutes, even when there is a very high genotype missing rate to the order of 99%. This tool can check genotype datasets obtained using different chips or platforms, plotting them in the same picture for comparison purposes.

Continue reading “GRAF, a tool for finding duplicates and closely related samples in large genomic datasets”