0% found this document useful (0 votes)
47 views16 pages

Free Dataset Sources

The document lists 40 free and reliable sources for datasets suitable for data analysis projects, including platforms like Kaggle, Google Dataset Search, and UCI Machine Learning Repository. Each source is briefly described, highlighting the types of data available and their relevance to various fields such as finance, healthcare, and social sciences. These resources are valuable for researchers, data analysts, and students looking for quality datasets for their projects.

Uploaded by

Jonas Sena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views16 pages

Free Dataset Sources

The document lists 40 free and reliable sources for datasets suitable for data analysis projects, including platforms like Kaggle, Google Dataset Search, and UCI Machine Learning Repository. Each source is briefly described, highlighting the types of data available and their relevance to various fields such as finance, healthcare, and social sciences. These resources are valuable for researchers, data analysts, and students looking for quality datasets for their projects.

Uploaded by

Jonas Sena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Top 40 Free

Dataset Sources
for Data
Analysis Projects

Pooja Pawar
Here are several free and reliable sources for datasets that you can use

for data analysis projects:

1. Kaggle

 Link: Kaggle Datasets

 Overview: Kaggle offers a vast collection of datasets on a variety

of topics including finance, healthcare, sports, and more. It also

provides a community for data analysts and data scientists to

collaborate.

2. Google Dataset Search

 Link: Google Dataset Search

 Overview: Google Dataset Search helps you find datasets stored

across the web. It's particularly helpful for finding scientific and

academic datasets.

Pooja Pawar
3. UCI Machine Learning Repository

 Link: UCI Repository

 Overview: UCI provides datasets widely used for machine

learning and data analysis projects, including well-documented

data with relevant information.

4. data.gov

 Link: Data.gov

 Overview: The US government’s open data portal with datasets

on a wide range of topics including health, agriculture, energy,

and education.

5. AWS Open Data Registry

 Link: AWS Open Data

 Overview: AWS hosts publicly available datasets for analysis,

including satellite imagery, healthcare data, and genomic data.

Pooja Pawar
6. World Bank Open Data

 Link: World Bank

 Overview: World Bank provides a wide range of data focused on

global development, including education, health, and economic

data.

7. FiveThirtyEight

 Link: FiveThirtyEight

 Overview: FiveThirtyEight shares the data behind its articles,

which includes datasets on politics, sports, economics, and

culture.

8. Google Cloud Public Datasets

 Link: Google Cloud Datasets

 Overview: Google Cloud provides access to a variety of large,

publicly available datasets, often used for big data and machine

learning projects.

Pooja Pawar
9. Open Data Portal (various countries)

 Links:

o UK Data

o Canada Data

o New Zealand Data

 Overview: Many governments provide public datasets on a wide

variety of topics such as population demographics,

transportation, and public health.

10. Academic Torrents

 Link: Academic Torrents

 Overview: A platform for sharing large scientific datasets,

particularly useful for those working in academia or with data-

heavy projects.

11. IMDB Datasets

 Link: IMDB Datasets

Pooja Pawar
 Overview: Offers a wide range of datasets related to movies,

actors, and ratings.

12. Quandl

 Link: Quandl

 Overview: A repository of financial, economic, and alternative

datasets, some of which are available for free.

13. Awesome Public Datasets

 Link: Awesome Public Datasets

 Overview: A curated list of public datasets across various

domains such as biology, finance, healthcare, and climate.

14. OpenStreetMap

 Link: OpenStreetMap

 Overview: A free editable map of the world, offering

geographical data that can be useful for location-based analyses.

Pooja Pawar
15. Yahoo Finance

 Link: Yahoo Finance

 Overview: Provides financial data for stocks, bonds,

commodities, and currencies, helpful for data analysis in finance.

16. UN Data

 Link: UN Data

 Overview: The United Nations provides a wide range of datasets

covering topics such as agriculture, education, health, and

economics across various countries.

17. Eurostat

 Link: Eurostat

 Overview: Eurostat is the statistical office of the European

Union. It offers free access to comprehensive and detailed data

on the economy, environment, and population in the EU.

Pooja Pawar
18. Pew Research Center

 Link: Pew Research Datasets

 Overview: Pew Research Center shares survey datasets on social

trends, politics, and global attitudes.

19. Harvard Dataverse

 Link: Harvard Dataverse

 Overview: Harvard Dataverse offers a repository where

researchers share, publish, and analyze datasets across multiple

fields such as law, medicine, social sciences, and arts.

20. TidyTuesday (R for Data Science)

 Link: TidyTuesday

 Overview: TidyTuesday is a weekly data project aimed at the R

community. Datasets are shared weekly and cover a broad range

of topics with the goal of promoting data wrangling and

visualization skills.

Pooja Pawar
21. Awesome Open Government Data

 Link: Awesome Open Government Data

 Overview: A collection of open government datasets from

various countries, including environment, finance, health, and

more.

22. Gapminder

 Link: Gapminder

 Overview: Gapminder offers data on global development,

focusing on human development indicators such as health,

education, and economics.

23. DBpedia

 Link: DBpedia

 Overview: DBpedia extracts structured content from Wikipedia

and makes it available as a dataset for projects requiring

semantic data.

Pooja Pawar
24. CMU StatLib

 Link: CMU StatLib

 Overview: Carnegie Mellon University's StatLib provides a

collection of datasets useful for statistical analysis and machine

learning.

25. OpenML

 Link: OpenML

 Overview: OpenML is a platform where you can explore, upload,

and share machine learning datasets, particularly useful for

benchmarking algorithms and comparing results.

26. DataHub

 Link: DataHub

 Overview: DataHub offers a collection of high-quality datasets

across various domains such as finance, health, and education.

Pooja Pawar
27. Humanitarian Data Exchange (HDX)

 Link: HDX

 Overview: The Humanitarian Data Exchange offers datasets

relevant to humanitarian crises, including food security,

refugees, and health.

28. The Global Health Observatory (GHO)

 Link: Global Health Observatory

 Overview: Provided by the World Health Organization, GHO

offers global health statistics, including mortality rates, health

systems, and diseases.

29. NOAA (National Oceanic and Atmospheric Administration)

 Link: NOAA Datasets

 Overview: NOAA provides a wealth of environmental data,

including climate, ocean, and atmospheric data for scientific and

analysis purposes.

Pooja Pawar
30. NASA Open Data

 Link: NASA Open Data

 Overview: NASA offers datasets related to space, earth sciences,

and astronomy, which can be used for analysis or research

projects.

31. Public Tableau Datasets

 Link: Tableau Public

 Overview: Tableau Public allows users to explore visualizations

and datasets across various domains like healthcare, finance,

and education, which can be downloaded for analysis.

32. Statista

 Link: Statista

 Overview: Though Statista is a paid service, it offers free access

to many datasets across various industries, including marketing,

consumer behavior, and technology.

Pooja Pawar
33. Inside Airbnb

 Link: Inside Airbnb

 Overview: Provides data about Airbnb listings in various cities

around the world, which is useful for urban planning, real estate

analysis, and tourism studies.

34. Public Data Sets on Azure

 Link: Azure Public Datasets

 Overview: Microsoft Azure hosts a collection of open datasets

for machine learning and AI model training across various

domains like healthcare, financial forecasting, and

transportation.

Pooja Pawar
35. Zillow Real Estate Data

 Link: Zillow

 Overview: Zillow provides real estate datasets that are useful for

analyzing housing trends, home values, and market dynamics in

the U.S.

36. re3data (Registry of Research Data Repositories)

 Link: re3data

 Overview: re3data is a global registry that provides access to

research data repositories across all academic disciplines.

37. The World Trade Organization (WTO)

 Link: WTO Data

 Overview: The WTO offers trade and tariff data between

countries, including information on global trade flows, tariffs,

and market access.

Pooja Pawar
38. DataSF

 Link: DataSF

 Overview: A portal providing open datasets from the city of San

Francisco, useful for urban planning, environmental studies,

transportation analysis, and more.

39. Indian Government Open Data

 Link: Data.gov.in

 Overview: The Indian government's open data platform provides

data on health, agriculture, education, and transportation,

specifically for India.

40. Yelp Open Dataset

 Link: Yelp Dataset

 Overview: Yelp provides a dataset for academic purposes,

containing reviews, business information, and user ratings,

Pooja Pawar
which is widely used for sentiment analysis and natural language

processing.

These sources collectively offer a wide range of data for projects across

many fields, including finance, healthcare, real estate, government

policy, climate, and social sciences.

Pooja Pawar

You might also like