The Registry of Open Data on AWS is now available on AWS Data Exchange
All datasets on the Registry of Open Data are now discoverable on AWS Data Exchange alongside 3,000+ existing data products from category-leading data providers across industries. Explore the catalog to find open, free, and commercial data sets. Learn more about AWS Data Exchange

About

This registry exists to help people discover and share datasets that are available via AWS resources. See recent additions and learn more about sharing data on AWS.

Get started using data quickly by viewing all tutorials with associated SageMaker Studio Lab notebooks.

See all usage examples for datasets listed in this registry.

See datasets from EPA, Allen Institute for Artificial Intelligence (AI2), Digital Earth Africa, Data for Good at Meta, NASA Space Act Agreement, NIH STRIDES, NOAA Open Data Dissemination Program, Space Telescope Science Institute, and Amazon Sustainability Data Initiative.


Search datasets (currently 13 matching datasets)


Add to this registry

If you want to add a dataset or example of how to use a dataset to this registry, please follow the instructions on the Registry of Open Data on AWS GitHub repository.

Unless specifically stated in the applicable dataset documentation, datasets available through the Registry of Open Data on AWS are not provided and maintained by AWS. Datasets are provided and maintained by a variety of third parties under a variety of licenses. Please check dataset licenses and related documentation to determine if a dataset may be used for your application.


Tell us about your project

If you have a project using a listed dataset, please tell us about it. We may work with you to feature your project in a blog post.

The Human Sleep Project

bioinformaticsdeep learninglife sciencesmachine learningmedicineneurophysiologyneuroscience

The Human Sleep Project (HSP) sleep physiology dataset is a growing collection of clinical polysomnography (PSG) recordings. Beginning with PSG recordings from from ~15K patients evaluated at the Massachusetts General Hospital, the HSP will grow over the coming years to include data from >200K patients, as well as people evaluated outside of the clinical setting. This data is being used to develop CAISR (Complete AI Sleep Report), a collection of deep neural networks, rule-based algorithms, and signal processing approaches designed to provide better-than-human detection of conventional PSG...

Details →

Usage examples

See 37 usage examples →

Common Crawl

encyclopedicinternetnatural language processingweb archive

A corpus of web crawl data composed of over 300 billion web pages.

Details →

Usage examples

See 36 usage examples →

The Cancer Genome Atlas

cancergenomiclife sciencesSTRIDESwhole genome sequencing

The Cancer Genome Atlas (TCGA), a collaboration between the National Cancer Institute (NCI) and National Human Genome Research Institute (NHGRI), aims to generate comprehensive, multi-dimensional maps of the key genomic changes in major types and subtypes of cancer. TCGA has analyzed matched tumor and normal tissues from 11,000 patients, allowing for the comprehensive characterization of 33 cancer types and subtypes, including 10 rare cancers. The dataset contains open Clinical Supplement, Biospecimen Supplement, RNA-Seq Gene Expression Quantification, miRNA-Seq Isoform Expression Quantificati...

Details →

Usage examples

See 29 usage examples →

CCRS MODIS albedo over Canada | Albédo MODIS du CCT couvrant le Canada

analysis ready databroadbandcogearth observationsatellite imagery

Times series of 10-day spectral and broadband albedo products derived at 250-m spatial resolution over Canadian territory and neighboring areas produced at the Canada Centre for Remote Sensing (CCRS) since February 2000 using MODIS L1B C6.1 swath imagery as input. The imagery for all spectral bands was downscaled and re-projected into the Lambert Conformal Conic (LCC) projection at 250-m spatial resolution. The area size is 5,700 km x 4,800 km (22,800 pixel x 19,200 lines). Séries temporelles de produits d’albédo spectral et à large bande générés à des intervalles de 10 jours avec une résolut...

Details →

Usage examples

See 24 usage examples →

Foldingathome COVID-19 Datasets

alchemical free energy calculationsbiomolecular modelingcoronavirusCOVID-19foldingathomehealthlife sciencesmolecular dynamicsproteinSARS-CoV-2simulationsstructural biology

Folding@home is a massively distributed computing project that uses biomolecular simulations to investigate the molecular origins of disease and accelerate the discovery of new therapies. Run by the Folding@home Consortium, a worldwide network of research laboratories focusing on a variety of different diseases, Folding@home seeks to address problems in human health on a scale that is infeasible by another other means, sharing the results of these large-scale studies with the research community through peer-reviewed publications and publicly shared datasets. During the COVID-19 epidemic, Folding@home focused its resources on understanding the vulnerabilities in SARS-CoV-2, the virus that causes COVID-19 disease, and working closely with a number of experimental collaborators to accelerate progress toward effective therapies for treating COVID-19 and ending the pandemic. In the process, it created the world's first exascale distributed computing resource, enabling it to generate valuable scientific datasets of unprecedented size. More information about Folding@home's COVID-19 research activities at the Folding@home COVID-19 page. In addition to working directly with experimental collaborators and rapidly sharing new research findings through preprint servers, Folding@home has joined other researchers in committing to rapidly share all COVID-19 research data, and has joined forces with AWS and the Molecular Sciences Software Institute (MolSSI) to share datasets of unprecedented side through the AWS Open Data Registry, indexing these massive datasets via the MolSSI COVID-19 Molecular Structure and Therapeutics Hub. The complete index of all Folding@home datasets can be found here. Th...

Details →

Usage examples

See 24 usage examples →

Sentinel-2

agriculturedisaster responseearth observationgeospatialnatural resourcesatellite imagerystac

The Sentinel-2 mission is a land monitoring constellation of two satellites that provide high resolution optical imagery and provide continuity for the current SPOT and Landsat missions. The mission provides a global coverage of the Earth's land surface every 5 days, making the data of great use in on-going studies. L1C data are available from June 2015 globally. L2A data are available from November 2016 over Europe region and globally since January 2017.

Details →

Usage examples

See 24 usage examples →

Therapeutically Applicable Research to Generate Effective Treatments (TARGET)

cancergenomiclife sciencesSTRIDESwhole genome sequencing

Therapeutically Applicable Research to Generate Effective Treatments (TARGET) is the collaborative effort of a large, diverse consortium of extramural and NCI investigators. The goal of the effort is to accelerate molecular discoveries that drive the initiation and progression of hard-to-treat childhood cancers and facilitate rapid translation of those findings into the clinic. TARGET projects provide comprehensive molecular characterization to determine the genetic changes that drive the initiation and progression of childhood cancers.The dataset contains open Clinical Supplement, Biospecimen...

Details →

Usage examples

See 24 usage examples →

USGS Landsat

agriculturecogdisaster responseearth observationgeospatialnatural resourcesatellite imagerystac

This joint NASA/USGS program provides the longest continuous space-based record of Earth’s land in existence. Every day, Landsat satellites provide essential information to help land managers and policy makers make wise decisions about our resources and our environment. Data is provided for Landsats 1, 2, 3, 4, 5, 7, 8, and 9 (excludes Landsat 6).As of June 28, 2023 (announcement), the previous single SNS topic arn:aws:sns:us-west-2:673253540267:public-c2-notify was replaced with three new SNS topics for different types of scenes.

Details →

Usage examples

See 23 usage examples →

Allen Cell Imaging Collections

biologycell biologycell imagingHomo sapiensimage processinglife sciencesmachine learningmicroscopy

This bucket contains multiple datasets (as Quilt packages) created by the Allen Institute for Cell Science. The types of data included in this bucket are listed below:

  1. Field of view or cropped images of cells
  2. Segmentations of structures in the images (e.g., boundaries of cells, DNA, other intracellular structures, etc.)
  3. Processed versions of the above images and segmentations
  4. Machine learning predictions and labels of the data listed above
  5. Models trained on the previously listed data
  6. Additional supporting non-image data related to the above listed data types (e.g., gene expression data, whole genome sequencing data, features derived from the images or model predictions, metadata)
  7. Simulation, analysis, and visualization data of in silico cell structures, cells, and cell populations
Extern...

Details →

Usage examples

See 20 usage examples →

Sudachi Language Resources

natural language processing

Japanese dictionaries and pre-trained models (word embeddings and language models) for natural language processing. SudachiDict is the dictionary for a Japanese tokenizer (morphological analyzer) Sudachi. chiVe is Japanese pretrained word embeddings (word vectors), trained using the ultra-large-scale web corpus NWJC by National...

Details →

Usage examples

See 20 usage examples →

CZ CELLxGENE Discover Census

bioinformaticscell biologylife sciencessingle-cell transcriptomicstranscriptomics

CZ CELLxGENE Discover (cellxgene.cziscience.com) is a free-to-use platform for the exploration, analysis, and retrieval of single-cell data. CZ CELLxGENE Discover hosts the largest aggregation of standardized single-cell data from the major human and mouse tissues, with modalities that include gene expression, chromatin accessibility, DNA methylation, and spatial transcriptomics. This year, CZ CELLxGENE Discover has made available all of its human and mouse RNA single-cell data through Census (https://chanzuckerberg.github.io/cellxgene-census/) – a free-to-use service with an API and data that...

Details →

Usage examples

See 19 usage examples →

Gabriella Miller Kids First Pediatric Research Program (Kids First)

cancergeneticgenomicHomo sapienslife sciencespediatricSTRIDESstructural birth defectwhole genome sequencing

The NIH Common Fund's Gabriella Miller Kids First Pediatric Research Program’s (“Kids First”) vision is to “alleviate suffering from childhood cancer and structural birth defects by fostering collaborative research to uncover the etiology of these diseases and by supporting data sharing within the pediatric research community.” The program continues to generate and share whole genome sequence data from thousands of children affected by these conditions, ranging from rare pediatric cancers, such as osteosarcoma, to more prevalent diagnoses, such as congenital heart defects. In 2018, Kids Fi...

Details →

Usage examples

See 19 usage examples →

NOAA Geostationary Operational Environmental Satellites (GOES) 16, 17, 18 & 19

agriculturedisaster responseearth observationgeospatialmeteorologicalsatellite imageryweather



NEW GOES-19 Data!! On April 4, 2025 at 1500 UTC, the GOES-19 satellite will be declared the Operational GOES-East satellite. All products and services, including NODD, for GOES-East will transition to GOES-19 data at that time. GOES-19 will operate out of the GOES-East location of 75.2°W starting on April 1, 2025 and through the operational transition. Until the transition time and during the final stretch of Post Launch Product Testing (PLPT), GOES-19 products are considered non-operational regardless of their validation maturity level. Shortly following the transition of GOES-19 to GOES-East, all data distribution from GOES-16 will be turned off. GOES-16 will drift to the storage location at 104.7°W. GOES-19 data should begin flowing again on April 4th once this maneuver is complete.

NEW GOES 16 Reprocess Data!! The reprocessed GOES-16 ABI L1b data mitigates systematic data issues (including data gaps and image artifacts) seen in the Operational products, and improves the stability of both the radiometric and geometric calibration over the course of the entire mission life. These data were produced by recomputing the L1b radiance products from input raw L0 data using improved calibration algorithms and look-up tables, derived from data analysis of the NIST-traceable, on-board sources. In addition, the reprocessed data products contain enhancements to the L1b file format, including limb pixels and pixel timestamps, while maintaining compatibility with the operational products. The datasets currently available span the operational life of GOES-16 ABI, from early 2018 through the end of 2024. The Reprocessed L1b dataset shows improvement over the Operational L1b products but may still contain data gaps or discrepancies. Please provide feedback to Dan Lindsey ([email protected]) and Gary Lin ([email protected]). More information can be found in the [GOES-R ABI Reprocess User Guide](https://github.com/NOAA-Big-Data-Program/nodd-data-docs/blob/main/GOES/GOES-R_ABI_Reprocessed_L1b_User_Guide-v1.1.pdf).

NOTICE: As of January 10th 2023, GOES-18 assumed the GOES-West position and all data files are deemed both operational and provisional, so no ‘preliminary, non-operational’ caveat is needed. GOES-17 is now offline, shifted approximately 105 degree West, where it will be in on-orbit storage. GOES-17 data will no longer flow into the GOES-17 bucket. Operational GOES-West products can be found in the GOES-18 bucket.

GOES satellites (GOES-16, GOES-17, GOES-18 & GOES-19) provide continuous weather imagery and monitoring of meteorological and space environment data across North America. GO
...

Details →

Usage examples

See 19 usage examples →

Sentinel-2 Cloud-Optimized GeoTIFFs

agriculturecogdisaster responseearth observationgeospatialnatural resourcesatellite imagerystac

The Sentinel-2 mission is a land monitoring constellation of two satellites that provide high resolution optical imagery and provide continuity for the current SPOT and Landsat missions. The mission provides a global coverage of the Earth's land surface every 5 days, making the data of great use in ongoing studies. This dataset is the same as the Sentinel-2 dataset, except the JP2K files were converted into Cloud-Optimized GeoTIFFs (COGs). Additionally, SpatioTemporal Asset Catalog metadata has were in a JSON file alongside the data, and a STAC API called Earth-search is freely available t...

Details →

Usage examples

See 19 usage examples →

Terrain Tiles

agriculturedisaster responseearth observationelevationgeospatial

A global dataset providing bare-earth terrain heights, tiled for easy usage and provided on S3.

Details →

Usage examples

See 19 usage examples →

NASA Prediction of Worldwide Energy Resources (POWER)

agricultureair qualityanalyticsarchivesatmosphereclimateclimate modeldata assimilationdeep learningearth observationenergyenvironmentalforecastgeosciencegeospatialglobalhistoryimagingindustrymachine learningmachine translationmetadatameteorologicalmodelnetcdfopendapradiationsatellite imagerysolarstatisticssustainabilitytime series forecastingwaterweatherzarr

NASA's goal in Earth science is to observe, understand, and model the Earth system to discover how it is changing, to better predict change, and to understand the consequences for life on Earth. The Applied Sciences Program, within the Earth Science Division of the NASA Science Mission Directorate, serves individuals and organizations around the globe by expanding and accelerating societal and economic benefits derived from Earth science, information, and technology research and development.

The Prediction Of Worldwide Energy Resources (POWER) Project, funded through the Applied Sciences Program at NASA Langley Research Center, gathers NASA Earth observation data and parameters related to the fields of surface solar irradiance and meteorology to serve the public in several free, easy-to-access and easy-to-use methods. POWER helps communities become resilient amid observed climate variability by improving data accessibility, aiding research in energy development, building energy efficiency, and supporting agriculture projects.

The POWER project contains over 380 satellite-derived meteorology and solar energy Analysis Ready Data (ARD) at four temporal levels: hourly, daily, monthly, and climatology. The POWER data archive provides data at the native resolution of the source products. The data is updated nightly to maintain near real time availability (2-3 days for meteorological parameters and 5-7 days for solar). The POWER services catalog consists of a series of RESTful Application Programming Interfaces, geospatial enabled image services, and web mapping Data Access Viewer. These three service offerings support data discovery, access, and distribution to the project’s user base as ARD and as direct application inputs to decision support tools.

The latest data version update includes hourly...

Details →

Usage examples

See 18 usage examples →

NEXRAD on AWS

agricultureearth observationmeteorologicalnatural resourceweather

Real-time and archival data from the Next Generation Weather Radar (NEXRAD) network.

Update

The NEXRAD Level II archive data is moving to a new bucket: unidata-nexrad-level2 and SNS topic: arn:aws:sns:us-east-1:684042711724:NewNEXRADLevel2Archive. The old bucket and SNS topic are now deprecated and will no longer be available starting September 1, 2025.

Details →

Usage examples

See 18 usage examples →

1000 Genomes Phase 3 Reanalysis with DRAGEN 3.5, 3.7, 4.0, 4.2, and 4.4

bambioinformaticsbiologycramgeneticgenomicgenotypinglife sciencesmachine learningpopulation geneticsshort read sequencingstructural variationtertiary analysisvariant annotationwhole genome sequencing

Description

Overivew

This dataset contains alignment files and small variant (includes single nucleotide variants (SNV) and indels), copy number variant (CNV), short tandem repeat (i.e., repeat expansion; STR), structural variant (SV) and other variant call files from the 1000 Genomes Project (1KGP) Phase 3 dataset (3,202 individuals, 602 trios) using Illumina DRAGEN v3.5.7b, v3.7.6, v4.0.3, v4.2.7, and v4.4.7 software. All DRAGEN analyses were performed in the cloud using the Illumina Connected Analytics bioinformatics platform powered by Amazon Web Services (see 'Data solution empowering population genomics' for more information). The v3.7.6, v4.2.7, and v4.4.7 datasets include results from trio small variant, de novo structural variant, and de novo copy number variant calls on 602 trio families comprised of members from the 1KGP Phase 3 dataset. Trio repeat expansion calling was included in the v3.7.6 dataset only. Joint cohort analysis was also performed on the entire 1KGP sample dataset for the v3.7.6, v4.0.3, v4.2.7, and v4.4.7 re-analyses using DRAGEN Iterative gVCF Genotyper v3.8.3, v4.2.0, v4.2.7, v4.4.7, respectively (see 'Genotyping variants at population scale using DRAGEN gVCF Genotyper' and 'Population Genotyping').

DRAGEN Versions

v3.7

User Guide | Release NotesImprovements and new features in the v3.7.6 individual samples analyses include CYP2D6 variant calling (see 'Overcoming high homology to detect variation in CYP21A2 with whole-genome sequencing in DRAGEN') and joint detection and use of graph-based hg19 and hg38 reference hash tables (see 'DRAGEN Wins at PrecisionFDA Truth Challenge V2 Showcase Accuracy Gains from Alt-aware Mapping and Graph Reference Genomes' and 'Demystifying the versions of GRCh38/hg38 reference genomes, how they are used in DRAGEN and their impact on accuracy' for details).

v4.0

User Guide | Release NotesThe DRAGEN v4.0.3 dataset features improved small variant calling accuracy due to utilization of a newly integrated machine learning functionality with an updated graph based reference for difficult to map regions (see 'DRAGEN Sets New Standard for Data Accuracy in PrecisionFDA Benchmark Data. Optimizing Variant Calling Performance with Illumina Machine Learning and DRAGEN Graph'); accuracy and runtime improvements in the SV caller; new targeted callers including CYP2B6, GBA, SMN and a Star Allele PGx caller; and an expanded catalog for use with Expansion Hunter STR caller.

v4.2

User Guide | Release NotesDRAGEN v4.2.7 offers significant accuracy improvements in small variant, CNV, and SV calling, includes new targeted callers (HBA, LPA, RH, CYP21A2, SMN silent carrier variant), and supports Star Allele calling for five additional pharmacogenes (BCHE, ABCG2, NAT2, F5, and UGT2B17). These are further improved by upgraded machine learning models. See DRAGEN 4.2: Enhanced machine learning, new targeted callers, and more for further details on these and other enchancements.

v4.4

User Guide | Release NotesDRAGEN v4.4.7 boosts the speed and accuracy of all callers via the official release of an optimized pangenome graph reference ('The quest for accuracy gains in the dark regions of the genomes: Presenting the DRAGEN multigenome mapper and pangenome reference updates in version 4.3'). Namely, SV calling accuracy is substantially increased via the implementation of a multigenome mapper capable of exploiting the power of a pangenome reference. Runtime is further reduced by supporting AWS F2 EC2 instances (Enabling Rapid Genomic and Multiomic Data Analysis with Illumina DRAGEN™ v4.4 on Amazon EC2 F2 Instances)

Annotation

Starting with the v4.0.3 reanalysis, annotation using the Illumina Connected Annotations (also known as Illumina Annotation Engine or Nirvana) was included as part of the analysis (see Illumina Connected Annotations documentation ...

Details →

Usage examples

See 17 usage examples →

Inter-mission Time Series of Land Ice Velocity and Elevation (ITS_LIVE)

cogearth observationgeophysicsgeospatialglobalicenetcdfsatellite imagerystaczarr

The Inter-mission Time Series of Land Ice Velocity and Elevation (ITS_LIVE) project has a singular mission: to accelerate ice sheet and glacier research by producing globally comprehensive, high resolution, low latency, temporally dense, multi-sensor records of land ice and ice shelf change while minimizing barriers between the data and the user. ITS_LIVE data currently consists of NetCDF Level 2 scene-pair ice flow products posted to a standard 120 m grid derived from Landsat 4/5/7/8/9, Sentinel-2 optical scenes, and Sentinel-1 SAR scenes. We have processed all land-ice intersecting image pai...

Details →

Usage examples

See 16 usage examples →

MERRA-2 tavg1_2d_slv_Nx: 2d,1-Hourly,Time-Averaged,Single-Level,Assimilation,Single-Level Diagnostics 0.625 x 0.5 degree

agricultureair temperatureatmospherebiodiversityclimatecoastaldatacenterecosystemsglobalhydrologyicelandmetadatanetcdfoceansopendapwater

M2T1NXSLV (or tavg1_2d_slv_Nx) is an hourly time-averaged 2-dimensional data collection in Modern-Era Retrospective analysis for Research and Applications version 2 (MERRA-2). This collection consists of meteorology diagnostics at popularly used vertical levels, such as air temperature at 2-meter (or at 10-meter, 850hPa, 500 hPa, 250hPa), wind components at 50-meter (or at 2-meter, 10-meter, 850 hPa, 500hPa, 250 hPa), sea level pressure, surface pressure, and total precipitable water vapor (or ice water, liquid water). The data field is time-stamped with the central time of an hour starting from 00:30 UTC, e.g.: 00:30, 01:30, … , 23:30 UTC.MERRA-2 is the latest version of global atmospheric reanalysis for the satellite era produced by NASA Global Modeling and Assimilation Office (GMAO) using the Goddard Earth Observing System Model (GEOS) version 5.12.4. The dataset covers the period of 1980-present with the latency of ~3 weeks after the end of a month. Data Reprocessing: Please check “Records of MERRA-2 Data Reprocessing and Service Changes” linked from the “Documentation” tab on this page. Note that a reprocessed data filename is different from the original file.MERRA-2 Mailing List: Sign up to receive information on reprocessing of data, changing of tools and services, as well as data announcements from GMAO. Contact the GES DISC Help Desk (gsfc-dl-help-disc@mail.nasa.gov) to be added to the list.Questions: If you have a question, please read "MERRA-2 File Specification Document", “MERRA-2 Data Access – Quick Start Guide”, and FAQs linked from the ”Documentation” tab on this page. If that does not answer your question, you may post your question to the NASA Earthdata Forum (forum.earthdata.nasa.gov) or email the GES DISC Help Desk (gsfc-dl-help-disc@mail.nasa.gov). Read our doc on how to get AWS Credentials to retrieve this data: Details →

Usage examples

See 16 usage examples →

ESA WorldCover

agriculturecogdisaster responseearth observationgeospatialland coverland usemachine learningmappingnatural resourcesatellite imagerystacsustainabilitysynthetic aperture radar

The European Space Agency (ESA) WorldCover product provides global land cover maps for 2020 & 2021 at 10 m resolution based on Copernicus Sentinel-1 and Sentinel-2 data. The WorldCover product comes with 11 land cover classes and has been generated in the framework of the ESA WorldCover project, part of the 5th Earth Observation Envelope Programme (EOEP-5) of the European Space Agency. A first version of the product (v100), containing the 2020 map was released in October 2021. The 2021 map was released in October 2022 using an improved algorithm (v200). The WorldCover 2020 and 2021 maps we...

Details →

Usage examples

See 15 usage examples →

Genome Aggregation Database (gnomAD)

bioinformaticsgeneticgenomiclife sciencespopulationpopulation geneticsshort read sequencingwhole genome sequencing

The Genome Aggregation Database (gnomAD) is a resource developed by an international coalition of investigators that aggregates and harmonizes both exome and genome data from a wide range of large-scale human sequencing projects. The summary data provided here are released for the benefit of the wider scientific community without restriction on use. The v4.1 data set (GRCh38) spans 730,947 exome sequences and 76,215 whole-genome sequences from unrelated individuals, of diverse ancestries, sequenced sequenced as part of various disease-specific and population genetic studies. The gnomAD Principal Investigators and team can be found here, and the groups that have contributed data to the current release are listed here. Sign up for the gnom...

Details →

Usage examples

See 15 usage examples →

GeoNet Aotearoa New Zealand Data

broadbandcoastalContinuously Operating Reference Station (CORS)earthquakesgeophysicsgeosciencegeoscienceGNSSGPSoceansRINEXseismology

GeoNet provides geological hazard information for Aotearoa New Zealand. This dataset contains data and products recorded by the GeoNet sensor network.

GNSS (Global Navigation Satellite System) data include raw data in proprietary and Receiver Independent Exchange Format (RINEX) and local tie-in survey conducted during equipment changes, more details can be found on the GeoNet geodetic page website.
Coastal gauge data include relative measurement of sea level measured by tsunami monitoring gauges. Raw and quality control data are provided in CREX format (Character Form for the Representtion and eXchange of metereological data), more details can be found on the GeoNet coastal tsunami monitoring gauges page.
Camera images data include webcam images from the GeoNet Volcano monitoring network and Built Environment Instrumentation Programme, more details can be found on the GeoNet camera page.
Waveform data include raw data from weak and strong motion instruments of the GeoNet seismic networks, more details can be found on the GeoNet seismic waveform page.
Seismic data products include strong motion derived data, more details can be found on the GeoNet Strong Motion products page.
Time Series data products include derived time...

Details →

Usage examples

See 15 usage examples →

MERRA-2 inst3_3d_aer_Nv: 3d,3-Hourly,Instantaneous,Model-Level,Assimilation,Aerosol Mixing Ratio 0.625 x 0.5 degree

agricultureair qualityatmospherebiodiversitycarbonclimatecoastaldatacenterecosystemsglobalhydrologyicelandmetadatanetcdfopendapwater

M2I3NVAER (or inst3_3d_aer_Nv) is an instantaneous 3-dimensional 3-hourly data collection in Modern-Era Retrospective analysis for Research and Applications version 2 (MERRA-2). This collection consists of assimilations of aerosol mixing ratio parameters at 72 model layers, such as dust, sulphur dioxide, sea salt, black carbon, and organic carbon. The data field is available every three hour starting from 00:00 UTC, e.g.: 00:00, 03:00, … , 21:00 UTC. Section 4.2 of the MERRA-2 File Specification document provides pressure values nominal for a 1000 hPa surface pressure and refers to the top edge of the layer. The lev=1 is for the top layer, and lev=72 is for the bottom (or surface) model layer. MERRA-2 is the latest version of global atmospheric reanalysis for the satellite era produced by NASA Global Modeling and Assimilation Office (GMAO) using the Goddard Earth Observing System Model (GEOS) version 5.12.4. The dataset covers the period of 1980-present with the latency of ~3 weeks after the end of a month. Data Reprocessing: Please check “Records of MERRA-2 Data Reprocessing and Service Changes” linked from the “Documentation” tab on this page. Note that a reprocessed data filename is different from the original file.MERRA-2 Mailing List: Sign up to receive information on reprocessing of data, changing of tools and services, as well as data announcements from GMAO. Contact the GES DISC Help Desk (gsfc-dl-help-disc@mail.nasa.gov) to be added to the list.Questions: If you have a question, please read "MERRA-2 File Specification Document", “MERRA-2 Data Access – Quick Start Guide”, and FAQs linked from the ”Documentation” tab on this page. If that does not answer your question, you may post your question to the NASA Earthdata Forum (forum.earthdata.nasa.gov) or email the GES DISC Help Desk (gsfc-dl-help-disc@mail.nasa.gov). Read our doc on how to get AWS Credentials to retrieve this data: Details →

Usage examples

See 15 usage examples →

MERRA-2 inst3_3d_asm_Np: 3d,3-Hourly,Instantaneous,Pressure-Level,Assimilation,Assimilated Meteorological Fields

agricultureair temperatureatmospherebiodiversityclimatecoastaldatacenterecosystemsglobalhydrologyicelandmetadatanetcdfopendapwater

M2I3NPASM (or inst3_3d_asm_Np) is an instantaneous 3-dimensional 3-hourly data collection in Modern-Era Retrospective analysis for Research and Applications version 2 (MERRA-2). This collection consists of assimilations of meteorological parameters at 42 pressure levels, such as temperature, wind components, vertical pressure velocity, water vapor, ozone mass mixing ratio, and layer height. The data field is available every three hours starting from 00:00 UTC, e.g.: 00:00, 03:00, … , 21:00 UTC. The information on the pressure levels can be found in the section 4.2 of the MERRA-2 File Specification document. MERRA-2 is the latest version of global atmospheric reanalysis for the satellite era produced by NASA Global Modeling and Assimilation Office (GMAO) using the Goddard Earth Observing System Model (GEOS) version 5.12.4. The dataset covers the period of 1980-present with the latency of ~3 weeks after the end of a month. Data Reprocessing: Please check “Records of MERRA-2 Data Reprocessing and Service Changes” linked from the “Documentation” tab on this page. Note that a reprocessed data filename is different from the original file.MERRA-2 Mailing List: Sign up to receive information on reprocessing of data, changing of tools and services, as well as data announcements from GMAO. Contact the GES DISC Help Desk (gsfc-dl-help-disc@mail.nasa.gov) to be added to the list.Questions: If you have a question, please read "MERRA-2 File Specification Document", “MERRA-2 Data Access – Quick Start Guide”, and FAQs linked from the ”Documentation” tab on this page. If that does not answer your question, you may post your question to the NASA Earthdata Forum (forum.earthdata.nasa.gov) or email the GES DISC Help Desk (gsfc-dl-help-disc@mail.nasa.gov). Read our doc on how to get AWS Credentials to retrieve this data: Details →

Usage examples

See 15 usage examples →

MERRA-2 inst3_3d_asm_Nv: 3d,3-Hourly,Instantaneous,Model-Level,Assimilation,Assimilated Meteorological Fields 0.625 x 0.5 degree

agricultureair temperatureatmospherebiodiversityclimatecoastaldatacenterecosystemsglobalhydrologyicelandmetadatanetcdfopendapwater

M2I3NVASM (or inst3_3d_asm_Nv) is an instantaneous 3-dimensional 3-hourly data collection in Modern-Era Retrospective analysis for Research and Applications version 2 (MERRA-2). This collection consists of assimilations of meteorological parameters at 72 model layers, such as temperature, wind components, vertical pressure velocity, water vapor, and layer height. The data field is available every three hour starting from 00:00 UTC, e.g.: 00:00, 03:00, … , 21:00 UTC. Section 4.2 of the MERRA-2 File Specification document provides pressure values nominal for a 1000 hPa surface pressure and refers to the top edge of the layer. The lev=1 is for the top layer, and lev=72 is for the bottom (or surface) model layer. MERRA-2 is the latest version of global atmospheric reanalysis for the satellite era produced by NASA Global Modeling and Assimilation Office (GMAO) using the Goddard Earth Observing System Model (GEOS) version 5.12.4. The dataset covers the period of 1980-present with the latency of ~3 weeks after the end of a month. Data Reprocessing: Please check “Records of MERRA-2 Data Reprocessing and Service Changes” linked from the “Documentation” tab on this page. Note that a reprocessed data filename is different from the original file.MERRA-2 Mailing List: Sign up to receive information on reprocessing of data, changing of tools and services, as well as data announcements from GMAO. Contact the GES DISC Help Desk (gsfc-dl-help-disc@mail.nasa.gov) to be added to the list.Questions: If you have a question, please read "MERRA-2 File Specification Document", “MERRA-2 Data Access – Quick Start Guide”, and FAQs linked from the ”Documentation” tab on this page. If that does not answer your question, you may post your question to the NASA Earthdata Forum (forum.earthdata.nasa.gov) or email the GES DISC Help Desk (gsfc-dl-help-disc@mail.nasa.gov). Read our doc on how to get AWS Credentials to retrieve this data: Details →

Usage examples

See 15 usage examples →

NOAA Joint Polar Satellite System (JPSS)

agricultureclimatemeteorologicalweather

Near Real Time JPSS data is now flowing! See bucket information on the right side of this page to access products!
Satellites in the JPSS constellation gather global measurements of atmospheric, terrestrial and oceanic conditions, including sea and land surface temperatures, vegetation, clouds, rainfall, snow and ice cover, fire locations and smoke plumes, atmospheric temperature, water vapor and ozone. JPSS delivers key observations for the Nation's essential products and services, including forecasting severe weather like hurricanes, tornadoes and blizzards days in advance, and assessin...

Details →

Usage examples

See 15 usage examples →

SpaceNet

computer visiondisaster responseearth observationgeospatialmachine learningsatellite imagery

SpaceNet, launched in August 2016 as an open innovation project offering a repository of freely available imagery with co-registered map features. Before SpaceNet, computer vision researchers had minimal options to obtain free, precision-labeled, and high-resolution satellite imagery. Today, SpaceNet hosts datasets developed by its own team, along with data sets from projects like IARPA’s Functional Map of the World (fMoW).

Details →

Usage examples

See 15 usage examples →

The Singapore Nanopore Expression Data Set

bambioinformaticsfast5fastafastqgenomiclife scienceslong read sequencingshort read sequencingtranscriptomics

The Singapore Nanopore Expression (SG-NEx) project is an international collaboration to generate reference transcriptomes and a comprehensive benchmark data set for long read Nanopore RNA-Seq. Transcriptome profiling is done using PCR-cDNA sequencing (PCR-cDNA), amplification-free cDNA sequencing (direct cDNA), direct sequencing of native RNA (direct RNA), and short read RNA-Seq. The SG-NEx core data includes 5 of the most commonly used cell lines and it is extended with additional cell lines and samples that cover a broad range of human tissues. All core samples are sequenced with at least 3 ...

Details →

Usage examples

See 15 usage examples →

2021 Amazon Last Mile Routing Research Challenge Dataset

amazon.scienceanalyticsdeep learninggeospatiallast milelogisticsmachine learningoptimizationroutingtransportationurban

The 2021 Amazon Last Mile Routing Research Challenge was an innovative research initiative led by Amazon.com and supported by the Massachusetts Institute of Technology’s Center for Transportation and Logistics. Over a period of 4 months, participants were challenged to develop innovative machine learning-based methods to enhance classic optimization-based approaches to solve the travelling salesperson problem, by learning from historical routes executed by Amazon delivery drivers. The primary goal of the Amazon Last Mile Routing Research Challenge was to foster innovative applied research in r...

Details →

Usage examples

See 17 usage examples →

Distributed Archives for Neurophysiology Data Integration (DANDI)

biologycalcium imagingcell imagingelectrophysiologyhdf5life sciencesneuroimagingneurophysiologyneurosciencezarr

DANDI is a public archive of neurophysiology datasets, including raw and processed data, and associated software containers. Datasets are shared according to Creative Commons CC0 or CC-BY licenses. This US BRAIN Initiative supported archive provides a broad range of cellular neurophysiology data including intracellular and extracellular electrophysiology, optophysiology, calcium imaging, fiber photometry, behavioral time-series, and images from immunostaining experiments, from over 20 species.Data is organized using community standards: NWB - Neurodata Without Borders, BIDS - Brain Imaging Data Structure, NGFF - Next Generation File Format for Zarr-based imaging data, and NIDM - Neuro Imaging Data Model.The S3 bucket is organized as follows:

  • dandisets/ - Metadata and manifests for each Dandiset version; manifests reference keys under blobs/ or zarrs/ for actual data. See DANDI schema for manifest format specifications.
  • blobs/ - Deduplicated binary data (NWB files) indexed by content hash.
  • zarrs/ - Zarr arrays for large imaging datasets.
D...

Details →

Usage examples

See 14 usage examples →

Digital Earth Africa Global Mangrove Watch

coastalcogdeafricaearth observationgeospatialland covernatural resourcesatellite imagerystacsustainability

The Global Mangrove Watch (GMW) dataset is a result of the collaboration between Aberystwyth University (U.K.), solo Earth Observation (soloEO; Japan), Wetlands International the World Conservation Monitoring Centre (UNEP-WCMC) and the Japan Aerospace Exploration Agency (JAXA). The primary objective of producing this dataset is to provide countries lacking a national mangrove monitoring system with first cut mangrove extent and change maps, to help safeguard against further mangrove forest loss and degradation. The Global Mangrove Watch dataset (version 2) consists of a global baseline map of ...

Details →

Usage examples

See 13 usage examples →

Digital Earth Africa Landsat Collection 2 Level 2

agriculturecogdeafricadisaster responseearth observationgeospatialnatural resourcesatellite imagerystac

Digital Earth Africa (DE Africa) provides free and open access to a copy of Landsat Collection 2 Level-2 products over Africa. These products are produced and provided by the United States Geological Survey (USGS). The Landsat series of Earth Observation satellites, jointly led by USGS and NASA, have been continuously acquiring images of the Earth’s land surface since 1972. DE Africa provides data from Landsat 5, 7 and 8 satellites, including historical observations dating back to late 1980s and regularly updated new acquisitions. New Level-2 Landsat 7 and Landsat 8 data are available after 15...

Details →

Usage examples

See 13 usage examples →

Fly Brain Anatomy: FlyLight Gen1 and Split-GAL4 Imagery

biologyfluorescence imagingimage processingimaginglife sciencesmicroscopyneurobiologyneuroimagingneuroscience

This data set, made available by Janelia's FlyLight project, consists of fluorescence images of Drosophila melanogaster driver lines, aligned to standard templates, and stored in formats suitable for rapid searching in the cloud. Additional data will be added as it is published.

Details →

Usage examples

See 13 usage examples →

RADIANT Public Data

cancergeneticgenomicHomo sapienslife sciencesmedical imagingpediatricradiologytranscriptomicswhole genome sequencing

The Real-time Analysis and Discovery in Integrated And Networked Technologies (RADIANT) initiative seeks to develop an extensible, federated framework for rapid exchange of multimodal clinical and research data on behalf of accelerated discovery and patient impact. Coordination and implementation of initial RADIANT deployments will leverage a network of more than 35 partnered health care systems and participating patient families within the Children’s Brain Tumor Network (CBTN) and the Pediatric Neuro-Oncology Consortium (PNOC). This data set is composed of public multi-modal data provisio...

Details →

Usage examples

See 13 usage examples →

Digital Earth Africa - Copernicus Global Land Service - Lake Water Quality

agriculturecogdeafricadisaster responseearth observationgeospatialnatural resourcesatellite imagerystacwater

The Copernicus Global Land Service – Lake Water Quality products offer a comprehensive, satellite-derived monitoring system for assessing key water quality indicators in major large lakes, typically those greater than 50 hectares. These datasets are generated using optical satellite sensors, primarily Sentinel-2 MSI and Sentinel-3 OLCI, with earlier archives derived from Envisat MERIS. Spanning multiple spatial resolutions (100 m and 300 m) and temporal scales (10-day composites), they support both near-real-time and retrospective assessments of inland water quality.Key parameters include surf...

Details →

Usage examples

See 11 usage examples →

Digital Earth Africa CHIRPS Rainfall

agricultureclimatecogdeafricaearth observationfood securitygeospatialmeteorologicalsatellite imagerystacsustainability

Digital Earth Africa (DE Africa) provides free and open access to a copy of the Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) monthly and daily products over Africa. The CHIRPS rainfall maps are produced and provided by the Climate Hazards Center in collaboration with the US Geological Survey, and use both rain gauge and satellite observations. The CHIRPS-2.0 Africa Monthly dataset is regularly indexed to DE Africa from the CHIRPS monthly data. The CHIRPS-2.0 Africa Daily dataset is likewise indexed from the CHIRPS daily data. Both products have been converted to clou...

Details →

Usage examples

See 11 usage examples →

Digital Earth Africa Coastlines

climatecoastaldeafricaearth observationgeospatialsatellite imagerysustainability

Africa's long and dynamic coastline is subject to a wide range of pressures, including extreme weather and climate, sea level rise and human development. Understanding how the coastline responds to these pressures is crucial to managing this region, from social, environmental and economic perspectives. The Digital Earth Africa Coastlines (provisional) is a continental dataset that includes annual shorelines and rates of coastal change along the entire African coastline from 2000 to the present. The product combines satellite data from the Digital Earth Africa program with tidal modelling t...

Details →

Usage examples

See 11 usage examples →

Digital Earth Africa GeoMAD

agriculturecogdeafricadisaster responseearth observationgeospatialnatural resourcesatellite imagerystac

GeoMAD is the Digital Earth Africa (DE Africa) surface reflectance geomedian and triple Median Absolute Deviation data service. It is a cloud-free composite of satellite data compiled over specific timeframes. The geomedian component combines measurements collected over the specified timeframe to produce one representative, multispectral measurement for every pixel unit of the African continent. The end result is a comprehensive dataset that can be used to generate true-colour images for visual inspection of anthropogenic or natural landmarks. The full spectral dataset can be used to develop m...

Details →

Usage examples

See 11 usage examples →

Digital Earth Africa Sentinel-2 Level-2A Surface Reflectance Collection 1

agriculturecogdeafricadisaster responseearth observationgeospatialnatural resourcesatellite imagerystac

The Sentinel-2 mission is part of the European Union Copernicus programme for Earth observations. Sentinel-2 consists of twin satellites, Sentinel-2A (launched 23 June 2015) and Sentinel-2B (launched 7 March 2017). The two satellites have the same orbit, but 180° apart for optimal coverage and data delivery. Their combined data is used in the Digital Earth Africa Sentinel-2 product. Together, they cover all Earth’s land surfaces, large islands, inland and coastal waters every 3-5 days. Sentinel-2 data is tiered by level of pre-processing. Level-0, Level-1A and Level-1B data contain raw data fr...

Details →

Usage examples

See 11 usage examples →

Digital Earth Africa Water Observations from Space

agriculturecogdeafricadisaster responseearth observationgeospatialnatural resourcesatellite imagerystacwater

Water Observations from Space (WOfS) is a service that draws on satellite imagery to provide historical surface water observations of the whole African continent. WOfS allows users to understand the location and movement of inland and coastal water present in the African landscape. It shows where water is usually present; where it is seldom observed; and where inundation of the surface has been observed by satellite. They are generated using the WOfS classification algorithm on Landsat satellite data. There are several WOfS products available for the African continent including scene-level dat...

Details →

Usage examples

See 11 usage examples →

International Neuroimaging Data-Sharing Initiative (INDI)

Homo sapiensimaginglife sciencesmagnetic resonance imagingneuroimagingneuroscience

This bucket contains multiple neuroimaging datasets that are part of the International Neuroimaging Data-Sharing Initiative. Raw human and non-human primate neuroimaging data include 1) Structural MRI; 2) Functional MRI; 3) Diffusion Tensor Imaging; 4) Electroencephalogram (EEG) In addition to the raw data, preprocessed data is also included for some datasets. A complete list of the available datasets can be seen in the documentation lonk provided below.

Details →

Usage examples

See 11 usage examples →

Low Altitude Disaster Imagery (LADI) Dataset

aerial imagerycoastalcomputer visiondisaster responseearth observationearthquakesgeospatialimage processingimaginginfrastructurelandmachine learningmappingnatural resourceseismologytransportationurbanwater

The Low Altitude Disaster Imagery (LADI) Dataset consists of human and machine annotated airborne images collected by the Civil Air Patrol in support of various disaster responses from 2015-2023. Two key distinctions are the low altitude, oblique perspective of the imagery and disaster-related features, which are rarely featured in computer vision benchmarks and datasets.

Details →

Usage examples

See 11 usage examples →

Maxar Open Data Program

cogdisaster responseearth observationgeospatialsatellite imagerystac

Pre and post event high-resolution satellite imagery in support of emergency planning, risk assessment, monitoring of staging areas and emergency response, damage assessment, and recovery. These images are generated using the Maxar ARD pipeline, tiled on an organized grid in analysis-ready cloud-optimized formats.

Details →

Usage examples

See 11 usage examples →

NOAA Operational Forecast System (OFS)

climatecoastaldisaster responseenvironmentalmeteorologicaloceanswaterweather

ANNOUNCEMENTS: [NOS OFS Version Updates and Implementation of Upgraded Oceanographic Forecast Modeling Systems for Lakes Superior and Ontario; Effective October 25, 2022}(https://www.weather.gov/media/notification/pdf2/scn22-91_nos_loofs_lsofs_v3.pdf)

For decades, mariners in the United States have depended on NOAA's Tide Tables for the best estimate of expected water levels. These tables provide accurate predictions of the astronomical tide (i.e., the change in water level due to the gravitational effects of the moon and sun and the rotation of the Earth); however, they cannot predict water-level changes due to wind, atmospheric pressure, and river flow, which are often significant.

The National Ocean Service (NOS) has the mission and mandate to provide guidance and information to support navigation and coastal needs. To support this mission, NOS has been developing and implementing hydrodynamic model-based Operational Forecast Systems.

This forecast guidance provides oceanographic information that helps mariners safely navigate their local waters. This national network of hydrodynamic models provides users with operational nowcast and forecast guidance (out to 48 – 120 hours) on parameters such as water levels, water temperature, salinity, and currents. These forecast systems are implemented in critical ports, harbors, estuaries, Great Lakes and coastal waters of the United States, and form a national backbone of real-time data, tidal predictions, data management and operational modeling.

Nowcasts and forecasts are scientific predictions about the present and future states of water levels (and possibly currents and other relevant oceanographic variables, such as salinity and temperature) in a coastal area. These predictions rely on either observed data or forecasts from a numerical model. A nowcast incorporates recent (and often near real-time) observed meteorological, oceanographic, and/or river flow rate data. A nowcast covers the period from the recent past (e.g., the past few days) to the present, and it can make predictions for locations where observational data are not available. A forecast incorporates meteorological, oceanographic, and/or river flow rate forecasts and makes predictions for times where observational data will not be available. A forecast is usually initiated by the results of a nowcast.

OFS generally runs four times per day (every 6 hours) on NOAA's Weather and Climate Operational Supercomputing Systems (WCOSS) in a standard Coastal Ocean Modeling Framework (COMF) developed by the Center for Operational Oceanographic Products and Services (CO-OPS). COMF is a set...

Details →

Usage examples

See 11 usage examples →

Open Targets

bioinformaticsbiologydrug discoverygeneticgenomiclife sciencesprotein

The Open Targets Platform is a comprehensive data integration tool that supports systematic identification and prioritisation of potential therapeutic drug targets. By integrating publicly available datasets including data generated by the Open Targets experimental and informatics research programmes, the Platform provides data and services to assist in the task of therapeutic hypothesis building.

Details →

Usage examples

See 11 usage examples →

The Cancer Dependency Map (DepMap) Cancer Cell Line Encyclopedia (CCLE) Dataset

bambioinformaticsbiologycancergeneticgenomicHomo sapienslife sciencesshort read sequencingtranscriptomicswhole exome sequencingwhole genome sequencing

This dataset consists of whole genome sequencing (WGS), whole exome sequencing (WES), and RNA sequencing files generated from ~1000 cancer cell lines described in Ghandi et al., 2019.

Details →

Usage examples

See 11 usage examples →

Alliance of Genome Resources

bioinformaticsbiologyCaenorhabditis elegansDanio rerioDrosophila melanogasterfastagene expressiongeneticgenomegenomicHomo sapienslife sciencesMus musculusproteinRattus norvegicustranscriptomicsvcf

The Alliance of Genome Resources is a consortium that integrates genomic, genetic, and molecular data from leading model organism databases including Drosophila melanogaster, Caenorhabditis elegans, Danio rerio (zebrafish), Mus musculus (mouse), Rattus norvegicus (rat), Saccharomyces cerevisiae (yeast), Xenopus laevis and Xenopus tropicalis (frogs), and human reference data. The Alliance provides comprehensive datasets including gene annotations, disease associations, expression data (bulk and single-cell RNA-Seq), protein and genetic interactions, orthology relationships, variants and alleles...

Details →

Usage examples

See 10 usage examples →

CBERS on AWS

agriculturecogdisaster responseearth observationgeospatialimagingsatellite imagerystac

Imagery acquired by the China-Brazil Earth Resources Satellite (CBERS), 4 and 4A. The image files are recorded and processed by Instituto Nacional de Pesquisas Espaciais (INPE) and are converted to Cloud Optimized Geotiff format in order to optimize its use for cloud based applications. Contains all CBERS-4 MUX, AWFI, PAN5M and PAN10M scenes acquired since the start of the satellite mission and is daily updated with new scenes. CBERS-4A MUX Level 4 (Orthorectified) scenes are being ingested starting from 04-13-2021. CBERS-4A WFI Level 4 (Orthorectified) scenes are being ingested starting from ...

Details →

Usage examples

See 10 usage examples →

Digital Earth Africa Sentinel-1 Radiometrically Terrain Corrected

agriculturecogdeafricadisaster responseearth observationgeospatialnatural resourcesatellite imagerystacsynthetic aperture radar

DE Africa’s Sentinel-1 backscatter product is developed to be compliant with the CEOS Analysis Ready Data for Land (CARD4L) specifications. The Sentinel-1 mission, composed of a constellation of two C-band Synthetic Aperture Radar (SAR) satellites, are operated by European Space Agency (ESA) as part of the Copernicus Programme. The mission currently collects data every 12 days over Africa at a spatial resolution of approximately 20 m. Radar backscatter measures the amount of microwave radiation reflected back to the sensor from the ground surface. This measurement is sensitive to surface rough...

Details →

Usage examples

See 10 usage examples →

Digital Earth Africa Sentinel-2 Level-2A

agriculturecogdeafricadisaster responseearth observationgeospatialnatural resourcesatellite imagerystac

The Sentinel-2 mission is part of the European Union Copernicus programme for Earth observations. Sentinel-2 consists of twin satellites, Sentinel-2A (launched 23 June 2015) and Sentinel-2B (launched 7 March 2017). The two satellites have the same orbit, but 180° apart for optimal coverage and data delivery. Their combined data is used in the Digital Earth Africa Sentinel-2 product. Together, they cover all Earth’s land surfaces, large islands, inland and coastal waters every 3-5 days. Sentinel-2 data is tiered by level of pre-processing. Level-0, Level-1A and Level-1B data contain raw data fr...

Details →

Usage examples

See 10 usage examples →

Garvan Institute Long Read Sequencing Benchmark Data

bioinformaticsgenomiclife scienceslong read sequencing

The dataset contains reference samples that will be useful for benchmarking and comparing bioinformatics tools for genome analysis. Examples include: NA12878 (HG001) and NA24385 (HG002) sequenced on an Oxford Nanopore Technologies (ONT) PromethION using the latest R10.4.1 flowcells; and, UHR RNA (direct-RNA) on an ONT PromethION using the latest RNA004 flowcells. Raw signal data output by the sequencer is provided for these datasets in BLOW5 format, and can be rebasecalled when basecalling software updates bring accuracy and feature improvements over the years. Raw signal data is not only for ...

Details →

Usage examples

See 10 usage examples →

Multi-Scale Ultra High Resolution (MUR) Sea Surface Temperature (SST)

climateearth observationenvironmentalnatural resourceoceanssatellite imagerywaterweather

A global, gap-free, gridded, daily 1 km Sea Surface Temperature (SST) dataset created by merging multiple Level-2 satellite SST datasets. Those input datasets include the NASA Advanced Microwave Scanning Radiometer-EOS (AMSR-E), the JAXA Advanced Microwave Scanning Radiometer 2 (AMSR-2) on GCOM-W1, the Moderate Resolution Imaging Spectroradiometers (MODIS) on the NASA Aqua and Terra platforms, the US Navy microwave WindSat radiometer, the Advanced Very High Resolution Radiometer (AVHRR) on several NOAA satellites, and in situ SST observations from the NOAA iQuam project. Data are available fro...

Details →

Usage examples

See 10 usage examples →

NHGRI AnVIL Project

biologygene expressiongenomegenomicHomo sapienslife sciences

The NHGRI Analysis, Visualization, and Informatics Lab-space (AnVIL) Project (https://anvilproject.org/) is the National Human Genome Research Institute's cloud-based platform for genomic data sharing and analysis. AnVIL hosts widely used human genome reference datasets generated through NHGRI-funded research. AnVIL on Open Data on AWS provides public access to open-access datasets available through AnVIL. The project is a collaborative effort involving NHGRI, the Broad Institute, Johns Hopkins University, the University of California Santa Cruz, Vanderbilt University Medical Center, Brigh...

Details →

Usage examples

  • A complete reference genome improves analysis of human genetic variation by Sergey Aganezov, Stephanie M. Yan, Daniela C. Soto, Melanie Kirsche, Samantha Zarate, Pavel Avdeyev, Dylan J. Taylor, Kishwar Shafin, Alaina Shumate, Chunlin Xiao, Justin Wagner, Jennifer McDaniel, Nathan D. Olson, Michael E. G. Sauria, Mitchell R. Vollger, Arang Rhie, Melissa Meredith, Skylar Martin, Joyce Lee, Sergey Koren, Jeffrey A. Rosenfeld, Benedict Paten, Ryan Layer, Chen-Shan Chin, Fritz J. Sedlazeck, Nancy F. Hansen, Danny E. Miller, Adam M. Phillippy, Karen H. Miga, Rajiv C. McCoy, Megan Y. Dennis, Justin M. Zook, Michael C. Schatz
  • Scalable Nanopore sequencing of human genomes provides a comprehensive view of haplotype-resolved variation and methylation by Mikhail Kolmogorov, Kimberley J. Billingsley, Mira Mastoras, Melissa Meredith, Jean Monlong, Ryan Lorig-Roach, Mobin Asri, Pilar Alvarez Jerez, Laksh Malik, Ramita Dewan, Xylena Reed, Rylee M. Genner, Kensuke Daida, Sairam Behera, Kishwar Shafin, Trevor Pesout, Jeshuwin Prabakaran, Paolo Carnevali, Jianzhi Yang, Arang Rhie, Sonja W. Scholz, Bryan J. Traynor, Karen H. Miga, Miten Jain, Winston Timp, Adam M. Phillippy, Mark Chaisson, Fritz J. Sedlazeck, Cornelis Blauwendraat, Benedict Paten
  • The complete sequence of a human Y chromosome by Arang Rhie, Sergey Nurk, Monika Cechova, Savannah J. Hoyt, Dylan J. Taylor, Nicolas Altemose, Paul W. Hook, Sergey Koren, Mikko Rautiainen, Ivan A. Alexandrov, Jamie Allen, Mobin Asri, Andrey V. Bzikadze, Nae-Chyun Chen, Chen-Shan Chin, Mark Diekhans, Paul Flicek, Giulio Formenti, Arkarachai Fungtammasan, Carlos Garcia Giron, Erik Garrison, Ariel Gershman, Jennifer L. Gerton, Patrick G. S. Grady, Andrea Guarracino, Leanne Haggerty, Reza Halabian, Nancy F. Hansen, Robert Harris, Gabrielle A. Hartley, William T. Harvey, Marina Haukness, Jakob Heinz, Thibaut Hourlier, Robert M. Hubley, Sarah E. Hunt, Stephen Hwang, Miten Jain, Rupesh K. Kesharwani, Alexandra P. Lewis, Heng Li, Glennis A. Logsdon, Julian K. Lucas, Wojciech Makalowski, Christopher Markovic, Fergal J. Martin, Ann M. Mc Cartney, Rajiv C. McCoy, Jennifer McDaniel, Brandy M. McNulty, Paul Medvedev, Alla Mikheenko, Katherine M. Munson, Terence D. Murphy, Hugh E. Olsen, Nathan D. Olson, Luis F. Paulin, David Porubsky, Tamara Potapova, Fedor Ryabov, Steven L. Salzberg, Michael E. G. Sauria, Fritz J. Sedlazeck, Kishwar Shafin, Valery A. Shepelev, Alaina Shumate, Jessica M. Storer, Likhitha Surapaneni, Angela M. Taravella Oill, Françoise Thibaud-Nissen, Winston Timp, Marta Tomaszkiewicz, Mitchell R. Vollger, Brian P. Walenz, Allison C. Watwood, Matthias H. Weissensteiner, Aaron M. Wenger, Melissa A. Wilson, Samantha Zarate, Yiming Zhu, Justin M. Zook, Evan E. Eichler, Rachel J. O’Neill, Michael C. Schatz, Karen H. Miga, Kateryna D. Makova, Adam M. Phillippy
  • The Galaxy platform for accessible, reproducible, and collaborative data analyses: 2024 update by The Galaxy Community
  • Approaching complete genomes, transcriptomes and epi-omes with accurate long-read sequencing by Sam Kovaka, Shujun Ou, Katharine M. Jenike, Michael C. Schatz

See 13 usage examples →

New Zealand Imagery

aerial imagerycogearth observationgeospatialsatellite imagerystac

The New Zealand Imagery dataset consists of New Zealand's publicly owned aerial and satellite imagery, which is freely available to use under an open licence. The dataset ranges from the latest high-resolution aerial imagery down to 5cm in some urban areas to lower resolution satellite imagery that provides full coverage of mainland New Zealand, Chathams and other offshore islands. It also includes historical imagery that has been scanned from film, orthorectified (removing distortions) and georeferenced (correctly positioned) to create a unique and crucial record of changes to the New Zea...

Details →

Usage examples

See 10 usage examples →

RADARSAT-1

agriculturecogdisaster responseearth observationgeospatialglobalicesatellite imagerysynthetic aperture radar

Developed and operated by the Canadian Space Agency, it is Canada's first commercial Earth observation satellite

Développé et exploité par l'Agence spatiale canadienne, il s'agit du premier satellite commercial d'observation de la Terre au Canada.

Details →

Usage examples

See 10 usage examples →

Catalina Sky Survey (CSS) subset data on AWS

astronomyobject detectionplanetarysurvey

Raw data that discovers Near Earth Objects (NEOs) which potentially could impact Earth

Details →

Usage examples

See 9 usage examples →

Department of Energy's Open Energy Data Initiative (OEDI)

energyenvironmentalgeospatiallidarmodelsolar

Data released under the Department of Energy's (DOE) Open Energy Data Initiative (OEDI). The Open Energy Data Initiative aims to improve and automate access of high-value energy data sets across the U.S. Department of Energy’s programs, offices, and national laboratories. OEDI aims to make data actionable and discoverable by researchers and industry to accelerate analysis and advance innovation.

Details →

Usage examples

See 9 usage examples →

Digital Earth Africa ALOS PALSAR, ALOS-2 PALSAR-2 and JERS-1

agriculturecogdeafricadisaster responseearth observationgeospatialnatural resourcesatellite imagerystacsynthetic aperture radar

The ALOS/PALSAR annual mosaic is a global 25 m resolution dataset that combines data from many images captured by JAXA’s PALSAR and PALSAR-2 sensors on ALOS-1 and ALOS-2 satellites respectively. This product contains radar measurement in L-band and in HH and HV polarizations. It has a spatial resolution of 25 m and is available annually for 2007 to 2010 (ALOS/PALSAR) and 2015 to 2020 (ALOS-2/PALSAR-2). The JERS annual mosaic is generated from images acquired by the SAR sensor on the Japanese Earth Resources Satellite-1 (JERS-1) satellite. This product contains radar measurement in L-band and H...

Details →

Usage examples

See 9 usage examples →

Digital Earth Africa Cropland Extent Map (2019)

agriculturecogdeafricaearth observationfood securitygeospatialsatellite imagerystacsustainability

Digital Earth Africa's cropland extent map (2019) shows the estimated location of croplands in Africa for the period January to December 2019. Cropland is defined as: "a piece of land of minimum 0.01 ha (a single 10m x 10m pixel) that is sowed/planted and harvest-able at least once within the 12 months after the sowing/planting date." This definition will exclude non-planted grazing lands and perennial crops which can be difficult for satellite imagery to differentiate from natural vegetation. This provisional cropland extent map has a resolution of 10m, and was built using Cope...

Details →

Usage examples

See 9 usage examples →

Digital Earth Africa Fractional Cover

agriculturecogdeafricadisaster responseearth observationgeospatialnatural resourcesatellite imagerystacsustainability

Fractional cover (FC) describes the landscape in terms of coverage by green vegetation, non-green vegetation (including deciduous trees during autumn, dry grass, etc.) and bare soil. It provides insight into how areas of dry vegetation and/or bare soil and green vegetation are changing over time. The product is derived from Landsat satellite data, using an algorithm developed by the Joint Remote Sensing Research Program. Digital Earth Africa's FC service has two components. Fractional Cover is estimated from each Landsat scene, providing measurements from individual days. Fractional Cover...

Details →

Usage examples

See 9 usage examples →

Digital Earth Africa Monthly Normalised Difference Vegetation Index (NDVI) Anomaly

agriculturecogdeafricadisaster responseearth observationgeospatialnatural resourcesatellite imagerystac

Digital Earth Africa’s Monthly NDVI Anomaly service provides estimate of vegetation condition, for each caldendar month, against the long-term baseline condition measured for the month from 1984 to 2020 in the NDVI Climatology. A standardised anomaly is calculated by subtracting the long-term mean from an observation of interest and then dividing the result by the long-term standard deviation. Positive NDVI anomaly values indicate vegetation is greener than average conditions, and are usually due to increased rainfall in a region. Negative values indicate additional plant stress relative to t...

Details →

Usage examples

See 9 usage examples →

KyFromAbove on AWS

aerial imagerycogdisaster responsedtmearth observationelevationgeopackagegeospatiallidarmappingstactifftiles

The KyFromAbove initiative is focused on building and maintaining a current basemap for Kentucky that can meet the needs of its users at the state, federal, local, and regional level. A common basemap, including current color leaf-off aerial photography and elevation data (LiDAR), reduces the cost of developing GIS applications, promotes data sharing, and add efficiencies to many business processes. All basemap data acquired through this effort is made available in the public domain. KyFromAbove acquires aerial imagery and LiDAR during leaf-off conditions in the Commonwealth. The imagery typic...

Details →

Usage examples

See 9 usage examples →

Louisiana Watershed Initiative (LWI) Model Data

bathymetryclimatecoastaldisaster responseelevationfloodsforecastgeospatialhydrologic modelhydrologyinfrastructureland coverland usemappingmeteorologicalmodelopen source softwareprecipitationsimulationssustainabilitywaterweather

Geographic (land cover, land elevation, etc.), meteorologic (pluvial, wind, etc.), hydrologic (fluvial, tidal, etc.), hydrodynamic (water surface elevations, flow velocities), and built environment (structures, levees, floodgates, culverts) data used as inputs to and outputs from numerical modeling software for the prediction of flood risk in stochastic and probabilistic frameworks. This data was collected from open sources, such as from the National Oceanographic and Atmospheric Administration (NOAA) or the United States Geological Survey (USGS). The format of these data is modified to su...

Details →

Usage examples

See 9 usage examples →

NREL Wind Integration National Dataset

environmentalgeospatialmeteorological

Released to the public as part of the Department of Energy's Open Energy Data Initiative, the Wind Integration National Dataset (WIND) is an update and expansion of the Eastern Wind Integration Data Set and Western Wind Integration Data Set. It supports the next generation of wind integration studies.

Details →

Usage examples

See 9 usage examples →

Open NeuroData

array tomographybiologyelectron microscopyimage processinglife scienceslight-sheet microscopymagnetic resonance imagingneuroimagingneuroscience

This bucket contains multiple neuroimaging datasets (as Neuroglancer Precomputed Volumes) across multiple modalities and scales, ranging from nanoscale (electron microscopy), to microscale (cleared lightsheet microscopy and array tomography), and mesoscale (structural and functional magnetic resonance imaging). Additionally, many of the datasets include segmentations and meshes.

Details →

Usage examples

See 9 usage examples →

PubSeq - Public Sequence Resource

bambioinformaticsbiologycoronavirusCOVID-19fast5fastafastqgeneticgenomichealthjsonlife scienceslong read sequencingmedicineMERSmetadataopen source softwareRDFSARSSARS-CoV-2SPARQL

COVID-19 PubSeq is a free and open online bioinformatics public sequence resource with on-the-fly analysis of sequenced SARS-CoV-2 samples that allows for a quick turnaround in identification of new virus strains. PubSeq allows anyone to upload sequence material in the form of FASTA or FASTQ files with accompanying metadata through the web interface or REST API.

Details →

Usage examples

See 9 usage examples →

Southern California Earthquake Data

earth observationearthquakesseismology

This dataset contains ground motion velocity and acceleration seismic waveforms recorded by the Southern California Seismic Network (SCSN) and archived at the Southern California Earthquake Data Center (SCEDC). A Distributed Acousting Sensing (DAS) dataset is included.

Details →

Usage examples

See 9 usage examples →

Steinegger Lab Datasets

bioinformaticslife sciencesmetagenomicsopen source softwareproteinprotein folding

The Steinegger Lab Dataset comprises biological databases and resources critical for protein sequence and structure analysis, developed to support ColabFold, MMseqs2, and Foldseek/Foldcomp—three high-performance computational tools widely used in bioinformatics.The MMseqs2 dataset serves as the backbone for our fast structure prediction tool, ColabFold, and includes UniRef30, BFD, and the ColabFold environmental databases. These datasets are specifically designed for the rapid generation of multiple sequence alignments (MSAs), which are essential for high-accuracy structure prediction. Beyond ...

Details →

Usage examples

See 9 usage examples →

USGS 3DEP LiDAR Point Clouds

agriculturedisaster responseelevationgeospatiallidarstac

The goal of the USGS 3D Elevation Program (3DEP) is to collect elevation data in the form of light detection and ranging (LiDAR) data over the conterminous United States, Hawaii, and the U.S. territories, with data acquired over an 8-year period. This dataset provides two realizations of the 3DEP point cloud data. The first resource is a public access organization provided in Entwine Point Tiles format, which a lossless, full-density, streamable octree based on LASzip (LAZ) encoding. The second resource is a Requester Pays of the original, Raw LAZ (Compressed LAS) 1.4 3DEP format, and more co...

Details →

Usage examples

See 9 usage examples →

World Bank - Light Every Night

cogdisaster responseearth observationsatellite imagerystac

Light Every Night - World Bank Nighttime Light Data – provides open access to all nightly imagery and data from the Visible Infrared Imaging Radiometer Suite Day-Night Band (VIIRS DNB) from 2012-2020 and the Defense Meteorological Satellite Program Operational Linescan System (DMSP-OLS) from 1992-2013. The underlying data are sourced from the NOAA National Centers for Environmental Information (NCEI) archive. Additional processing by the University of Michigan enables access in Cloud Optimized GeoTIFF format (COG) and search using the Spatial Temporal Asset Catalog (STAC) standard. The data is...

Details →

Usage examples

See 9 usage examples →

nuScenes

autonomous vehiclescomputer visionlidarroboticstransportationurban

Public large-scale dataset for autonomous driving. It enables researchers to study challenging urban driving situations using the full sensor suite of a real self-driving car.

Details →

Usage examples

See 9 usage examples →

ArcticDEM

cogearth observationelevationgeospatialmappingopen source softwaresatellite imagerystac

ArcticDEM - 2m GSD Digital Elevation Models (DEMs) and mosaics from 2007 to the present. The ArcticDEM project seeks to fill the need for high-resolution time-series elevation data in the Arctic. The time-dependent nature of the strip DEM files allows users to perform change detection analysis and to compare observations of topography data acquired in different seasons or years. The mosaic DEM tiles are assembled from multiple strip DEMs with the intention of providing a more consistent and comprehensive product over large areas. ArcticDEM data is constructed from in-track and cross-track high...

Details →

Usage examples

See 8 usage examples →

Boreas Autonomous Driving Dataset

autonomous vehiclescomputer visionlidarrobotics

This autonomous driving dataset includes data from a 128-beam Velodyne Alpha-Prime lidar, a 5MP Blackfly camera, a 360-degree Navtech radar, and post-processed Applanix POS LV GNSS data. This dataset was collect in various weather conditions (sun, rain, snow) over the course of a year. The intended purpose of this dataset is to enable benchmarking of long-term all-weather odometry and metric localization across various sensor types. In the future, we hope to also support an object detection benchmark.

Details →

Usage examples

See 8 usage examples →

CHAMMI-75

biologycell imagingfluorescence imaginghigh-throughput imagingimaginglife sciencesmachine learningmicroscopy

Quantifying cell morphology using images and machine learning models has proven to be a powerful tool to study the response of cells to treatments. However, the models used to quantify cellular morphology are typically trained with a single microscopy imaging type and under controlled experimental conditions. This results in specialized models that cannot be reused across biological studies because the technical specifications do not match (e.g., different number of channels), or because the target experimental conditions are out of distribution. We have created CHAMMI-75, a large-scale dat...

Details →

Usage examples

See 8 usage examples →

CMIP6 GCMs downscaled using WRF

agricultureatmosphereclimateearth observationenvironmentalmodeloceanssimulationsweather

High-resolution historical and future climate simulations from 1980-2100

Details →

Usage examples

See 8 usage examples →

Cancer Cell Line Encyclopedia (CCLE)

cancergeneticgenomicHomo sapienslife sciencesSTRIDEStranscriptomicswhole genome sequencing

The Cancer Cell Line Encyclopedia (CCLE) project is an effort to conduct a detailed genetic characterization of a large panel of human cancer cell lines. The CCLE provides public access to genomic data, visualization and analysis for over 1100 cancer cell lines. This dataset contains RNA-Seq Aligned Reads, WXS Aligned Reads, and WGS Aligned Reads data.

Details →

Usage examples

See 8 usage examples →

DOE's Water Power Technology Office's (WPTO) US Wave dataset

earth observationenergygeospatialmeteorologicalwater

Released to the public as part of the Department of Energy's Open Energy Data Initiative, this is the highest resolution publicly available long-term wave hindcast dataset that – when complete – will cover the entire U.S. Exclusive Economic Zone (EEZ).

Details →

Usage examples

See 8 usage examples →

Digital Earth Africa Normalised Difference Vegetation Index (NDVI) Climatology

agricultureagriculturecogdeafricadisaster responseearth observationgeospatialnatural resourcesatellite imagerystac

Digital Earth Africa’s NDVI climatology product represents the long-term average baseline condition of vegetation for every Landsat pixel over the African continent. Both mean and standard deviation NDVI climatologies are available for each calender month.Some key features of the product are:

Logan Unitigs and Contigs of the Sequence Read Archive (SRA) on AWS

fastageneticgenomiclife sciencesmetagenomicsSTRIDEStranscriptomicswhole exome sequencingwhole genome sequencing

This repository is a re-analysis of the NCBI Sequence Read Archive (SRA), December 2023 freeze, to make it more accessible. The SRA is an open access database of biological sequences, containing raw data from high-throughput DNA and RNA sequencing platforms. It is the largest database of public DNA sequences worldwide, containing a wealth of genomic diversity across all living organisms. This repository contains Logan, a set of compressed FASTA files for all individual SRA accessions, in the form of unitigs and contigs. Borrowing methods from the realm of genome assembly, unitigs preserve near...

Details →

Usage examples

See 8 usage examples →

NASA Earth Exchange (NEX) Data Collection

climateCMIP5natural resourcesustainability

A collection of downscaled climate change projections, derived from the General Circulation Model (GCM) runs conducted under the Coupled Model Intercomparison Project Phase 5 (CMIP5) [Taylor et al. 2012] and across the four greenhouse gas emissions scenarios known as Representative Concentration Pathways (RCPs) [Meinshausen et al. 2011]. The NASA Earth Exchange group maintains the NEX-DCP30 (CMIP5), NEX-GDDP (CMIP5), and LOCA (CMIP5).

Details →

Usage examples

See 8 usage examples →

NIH Roadmap Epigenomics

bioinformaticsbiologyepigenomicsgeneticgenomiclife sciences

The NIH Roadmap Epigenomics Mapping Consortium was launched with the goal of producing a public resource of human epigenomic data to catalyze basic biology and disease-oriented research. The project has generated high-quality, genome-wide maps of several key histone modifications, chromatin accessibility, DNA methylation and mRNA expression across 100s of human cell types and tissues. To see what data is available, please check the directory listing: https://roadmapepigenomics.s3.us-west-2.amazonaws.com/index.html.

Details →

Usage examples

See 8 usage examples →

NOAA Water-Column Sonar Data Archive

biodiversityearth observationecosystemsenvironmentalgeospatialmappingoceans

Water-column sonar data archived at the NOAA National Centers for Environmental Information.

Details →

Usage examples

See 8 usage examples →

New Zealand Elevation

cogearth observationelevationgeospatialstac

The New Zealand Elevation dataset consists of New Zealand's publicly owned digital elevation models and digital surface models, which are freely available to use under an open licence. The dataset contains 1m resolution grids derived from LiDAR data. Point clouds are not included in the initial release.All of the elevation files are Cloud Optimised GeoTIFFs using LERC compression for the main grid and LERC compression with lower max_z_error for the overviews. These elevation files are accompanied by

Details →

Usage examples

See 8 usage examples →

Northern California Earthquake Data

earth observationearthquakesseismology

This dataset contains various types of digital data relating to earthquakes in central and northern California. Time series data come from broadband, short period, and strong motion seismic sensors, GPS, and other geophysical sensors.

Details →

Usage examples

See 8 usage examples →

Open CEDA by Watershed

carbonclimateEEIOscope 3spend-based modelssupply chain

CEDA is a multi-regional Environmentally-Extended Input-Output (EEIO) model developed to support a wide range of environmental systems analyses—including corporate carbon accounting and sustainable spend analysis. CEDA provides unparalleled global coverage and granularity, representing 95% of the world's GDP across 148 countries and 400 sectors, enabling robust and geographically comprehensive Scope 3 greenhouse gas (GHG) measurement. Open CEDA is the publicly avaialable version of CEDA, now easy to download and available for free for all use cases. For more information please visit our w...

Details →

Usage examples

See 8 usage examples →

Radiant MLHub

cogearth observationenvironmentalgeospatiallabeledmachine learningsatellite imagerystac

Radiant MLHub is an open library for geospatial training data that hosts datasets generated by Radiant Earth Foundation's team as well as other training data catalogs contributed by Radiant Earth’s partners. Radiant MLHub is open to anyone to access, store, register and/or share their training datasets for high-quality Earth observations. All of the training datasets are stored using a SpatioTemporal Asset Catalog (STAC) compliant catalog and exposed through a common API. Training datasets include pairs of imagery and labels for different types of machine learning problems including image ...

Details →

Usage examples

See 8 usage examples →

Reference Elevation Model of Antarctica (REMA)

cogearth observationelevationgeospatialmappingopen source softwaresatellite imagerystac

The Reference Elevation Model of Antarctica - 2m GSD Digital Elevation Models (DEMs) and mosaics from 2009 to the present. The REMA project seeks to fill the need for high-resolution time-series elevation data in the Antarctic. The time-dependent nature of the strip DEM files allows users to perform change detection analysis and to compare observations of topography data acquired in different seasons or years. The mosaic DEM tiles are assembled from multiple strip DEMs with the intention of providing a more consistent and comprehensive product over large areas. REMA data is constructed from in...

Details →

Usage examples

See 8 usage examples →

Toxicant Exposures and Responses by Genomic and Epigenomic Regulators of Transcription (TaRGET)

bioinformaticsbiologyenvironmentalepigenomicsgeneticgenomiclife sciences

The TaRGET (Toxicant Exposures and Responses by Genomic and Epigenomic Regulators of Transcription) Program is a research consortium funded by the National Institute of Environmental Health Sciences (NIEHS). The goal of the collaboration is to address the role of environmental exposures in disease pathogenesis as a function of epigenome perturbation, including understanding the environmental control of epigenetic mechanisms and assessing the utility of surrogate tissue analysis in mouse models of disease-relevant environmental exposures.

Details →

Usage examples

See 8 usage examples →

U.S. Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure High Throughput Transcriptomics Data

bioinformaticsfastqgene expressiontranscriptomics

High-throughput transcriptomics (HTTr) data generated by US EPA Office of Research and Development, Center for Computational Toxicology and Exposure (CCTE), Biomolecular and Computational Toxicology Division. All data is generated using TempO-Seq targeted RNA-seq technology from in vitro cell culture systems.

Details →

Usage examples

See 8 usage examples →

ASTER L1T Cloud-Optimized GeoTIFFs

cogearth observationgeospatialminingnatural resourcesatellite imagerysustainability

The Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) Level 1 Precision Terrain Corrected Registered At-Sensor Radiance (AST_L1T) data contains calibrated at-sensor radiance, which corresponds with the ASTER Level 1B (AST_L1B), that has been geometrically corrected, and rotated to a north-up UTM projection. The AST_L1T is created from a single resampling of the corresponding ASTER L1A (AST_L1A) product.The precision terrain correction process incorporates GLS2000 digital elevation data with derived ground control points (GCPs) to achieve topographic accuracy for all daytim...

Details →

Usage examples

See 7 usage examples →

BossDB Open Neuroimagery Datasets

calcium imagingelectron microscopyimaginglife scienceslight-sheet microscopymagnetic resonance imagingneuroimagingneurosciencevolumetric imagingx-rayx-ray microtomographyx-ray tomography

This data ecosystem, Brain Observatory Storage Service & Database (BossDB), contains several neuro-imaging datasets across multiple modalities and scales, ranging from nanoscale (electron microscopy), to microscale (cleared lightsheet microscopy and array tomography), and mesoscale (structural and functional magnetic resonance imaging). Additionally, many of the datasets include dense segmentation and meshes.

Details →

Usage examples

See 7 usage examples →

CIViC (Clinical Interpretation of Variants in Cancer)

cancergeneticgenomiclife sciencesvcf

Precision medicine refers to the use of prevention and treatment strategies that are tailored to the unique features of each individual and their disease. In the context of cancer this might involve the identification of specific mutations shown to predict response to a targeted therapy. The biomedical literature describing these associations is large and growing rapidly. Currently these interpretations exist largely in private or encumbered databases resulting in extensive repetition of effort. Realizing precision medicine will require this information to be centralized, debated and interpret...

Details →

Usage examples

See 7 usage examples →

Clinical Proteomic Tumor Analysis Consortium 2 (CPTAC-2)

cancergenomiclife sciencesSTRIDEStranscriptomics

The Clinical Proteomic Tumor Analysis Consortium (CPTAC) is a national effort to accelerate the understanding of the molecular basis of cancer through the application of large-scale proteome and genome analysis, or proteogenomics. CPTAC-2 is the Phase II of the CPTAC Initiative (2011-2016). Datasets contain open RNA-Seq Gene Expression Quantification, miRNA-Seq Isoform Expression Quantification, and miRNA Expression Quantification data.

Details →

Usage examples

See 7 usage examples →

Coupled Model Intercomparison Project 6

agricultureatmosphereclimateearth observationenvironmentalmodeloceanssimulationsweather

The sixth phase of global coupled ocean-atmosphere general circulation model ensemble.

Details →

Usage examples

See 7 usage examples →

Earth Observation Data Cubes for Brazil

cogearth observationgeosciencegeospatialimage processingopen source softwaresatellite imagerystac

Earth observation (EO) data cubes produced from analysis-ready data (ARD) of CBERS-4, Sentinel-2 A/B and Landsat-8 satellite images for Brazil. The datacubes are regular in time and use a hierarchical tiling system. Further details are described in Ferreira et al. (2020).

Details →

Usage examples

See 7 usage examples →

Global Database of Events, Language and Tone (GDELT)

disaster responseevents

This project monitors the world's broadcast, print, and web news from nearly every corner of every country in over 100 languages and identifies the people, locations, organizations, counts, themes, sources, emotions, quotes, images and events driving our global society every second of every day.

Details →

Usage examples

See 7 usage examples →

IBL Neuropixels Reproducible Ephys Data on AWS

life sciencesMus musculusneurophysiologyneuroscienceopen source software

Electrophysiological recordings acquired using Neuropixels probes in different mice and labs, targeting the same brain locations (including posterior parietal cortex, hippocampus, and thalamus).

Details →

Usage examples

See 7 usage examples →

ICGC on AWS

bamcancergeneticgenomiclife sciencesvcf

The International Cancer Genome Consortium (ICGC) coordinates projects with the common aim of accelerating research into the causes and control of cancer. The PanCancer Analysis of Whole Genomes (PCAWG) study is an international collaboration to identify common patterns of mutation in whole genomes from ICGC. More than 2,400 consistently analyzed genomes corresponding to over 1,100 unique ICGC donors are now freely available on Amazon S3 to credentialed researchers subject to ICGC data sharing policies.

Details →

Usage examples

See 7 usage examples →

Materials Project Data

chemistrycloud computingdata assimilationdigital assetsdigital preservationenergyenvironmentalfree softwaregenomeHPCinformation retrievalinfrastructurejsonmachine learningmaterials sciencemolecular dynamicsmoleculeopen source softwarephysicspost-processingx-ray crystallography

Materials Project is an open database of computed materials properties aiming to accelerate materials science research. The resources in this OpenData dataset contain the raw, parsed, and build data products.

Details →

Usage examples

See 7 usage examples →

NOAA National Water Model CONUS Retrospective Dataset

agricultureagricultureclimatedisaster responseenvironmentaltransportationweather

The NOAA National Water Model Retrospective dataset contains input and output from multi-decade CONUS retrospective simulations. These simulations used meteorological input fields from meteorological retrospective datasets. The output frequency and fields available in this historical NWM dataset differ from those contained in the real-time operational NWM forecast model. Additionally, note that no streamflow or other data assimilation is performed within any of the NWM retrospective simulations

One application of this dataset is to provide historical context to current near real-time streamflow, soil moisture and snowpack conditions. The retrospective data can be used to infer flow frequencies and perform temporal analyses with hourly streamflow output and 3-hourly land surface output. This dataset can also be used in the development of end user applications which require a long baseline of data for system training or verification purposes.

...

Details →

Usage examples

See 7 usage examples →

OpenAQ

air qualitycitiesenvironmentalgeospatial

Global, aggregated physical air quality data from public data sources provided by government, research-grade and other sources. These awesome groups do the hard work of measuring these data and publicly sharing them, and our community makes them more universally-accessible to both humans and machines.

Details →

Usage examples

See 7 usage examples →

Scottish Public Sector LiDAR Dataset

citiescoastalcogelevationenvironmentallidarurban

This dataset is Lidar data that has been collected by the Scottish public sector and made available under the Open Government Licence. The data are available as point cloud (LAS format or in LAZ compressed format), along with the derived Digital Terrain Model (DTM) and Digital Surface Model (DSM) products as Cloud optimized GeoTIFFs (COG) or standard GeoTIFF. The dataset contains multiple subsets of data which were each commissioned and flown in response to different organisational requirements. The details of each can be found at https://remotesensingdata.gov.scot/data#/list

Details →

Usage examples

See 7 usage examples →

SnpEff & SnpSift Genomic Variant Annotation Databases

bioinformaticscancergeneticgenomegenomiclife sciencesproteinstructural variationtranscriptomicsvariant annotationvcfwhole exome sequencingwhole genome sequencing

SnpEff is a variant annotation and effect prediction tool that annotates and predicts the effects of genetic variants on genes and proteins (such as amino acid changes). It supports over 38,000 genomes and provides comprehensive genomic databases for variant annotation. The databases include reference genomes, gene annotations, protein sequences, and regulatory elements from trusted sources like ENSEMBL, RefSeq, and UCSC. SnpSift complements SnpEff by providing tools to annotate genomic variants using databases, filter large genomic datasets, and manipulate annotated variants. Together, these ...

Details →

Usage examples

See 7 usage examples →

nuPlan

autonomous vehicleslidarroboticstransportationurban

nuPlan is the world's first large-scale planning benchmark for autonomous driving.

Details →

Usage examples