Top 40 Free
Dataset Sources
for Data
Analysis Projects
Pooja Pawar
Here are several free and reliable sources for datasets that you can use
for data analysis projects:
1. Kaggle
Link: Kaggle Datasets
Overview: Kaggle offers a vast collection of datasets on a variety
of topics including finance, healthcare, sports, and more. It also
provides a community for data analysts and data scientists to
collaborate.
2. Google Dataset Search
Link: Google Dataset Search
Overview: Google Dataset Search helps you find datasets stored
across the web. It's particularly helpful for finding scientific and
academic datasets.
Pooja Pawar
3. UCI Machine Learning Repository
Link: UCI Repository
Overview: UCI provides datasets widely used for machine
learning and data analysis projects, including well-documented
data with relevant information.
4. data.gov
Link: Data.gov
Overview: The US government’s open data portal with datasets
on a wide range of topics including health, agriculture, energy,
and education.
5. AWS Open Data Registry
Link: AWS Open Data
Overview: AWS hosts publicly available datasets for analysis,
including satellite imagery, healthcare data, and genomic data.
Pooja Pawar
6. World Bank Open Data
Link: World Bank
Overview: World Bank provides a wide range of data focused on
global development, including education, health, and economic
data.
7. FiveThirtyEight
Link: FiveThirtyEight
Overview: FiveThirtyEight shares the data behind its articles,
which includes datasets on politics, sports, economics, and
culture.
8. Google Cloud Public Datasets
Link: Google Cloud Datasets
Overview: Google Cloud provides access to a variety of large,
publicly available datasets, often used for big data and machine
learning projects.
Pooja Pawar
9. Open Data Portal (various countries)
Links:
o UK Data
o Canada Data
o New Zealand Data
Overview: Many governments provide public datasets on a wide
variety of topics such as population demographics,
transportation, and public health.
10. Academic Torrents
Link: Academic Torrents
Overview: A platform for sharing large scientific datasets,
particularly useful for those working in academia or with data-
heavy projects.
11. IMDB Datasets
Link: IMDB Datasets
Pooja Pawar
Overview: Offers a wide range of datasets related to movies,
actors, and ratings.
12. Quandl
Link: Quandl
Overview: A repository of financial, economic, and alternative
datasets, some of which are available for free.
13. Awesome Public Datasets
Link: Awesome Public Datasets
Overview: A curated list of public datasets across various
domains such as biology, finance, healthcare, and climate.
14. OpenStreetMap
Link: OpenStreetMap
Overview: A free editable map of the world, offering
geographical data that can be useful for location-based analyses.
Pooja Pawar
15. Yahoo Finance
Link: Yahoo Finance
Overview: Provides financial data for stocks, bonds,
commodities, and currencies, helpful for data analysis in finance.
16. UN Data
Link: UN Data
Overview: The United Nations provides a wide range of datasets
covering topics such as agriculture, education, health, and
economics across various countries.
17. Eurostat
Link: Eurostat
Overview: Eurostat is the statistical office of the European
Union. It offers free access to comprehensive and detailed data
on the economy, environment, and population in the EU.
Pooja Pawar
18. Pew Research Center
Link: Pew Research Datasets
Overview: Pew Research Center shares survey datasets on social
trends, politics, and global attitudes.
19. Harvard Dataverse
Link: Harvard Dataverse
Overview: Harvard Dataverse offers a repository where
researchers share, publish, and analyze datasets across multiple
fields such as law, medicine, social sciences, and arts.
20. TidyTuesday (R for Data Science)
Link: TidyTuesday
Overview: TidyTuesday is a weekly data project aimed at the R
community. Datasets are shared weekly and cover a broad range
of topics with the goal of promoting data wrangling and
visualization skills.
Pooja Pawar
21. Awesome Open Government Data
Link: Awesome Open Government Data
Overview: A collection of open government datasets from
various countries, including environment, finance, health, and
more.
22. Gapminder
Link: Gapminder
Overview: Gapminder offers data on global development,
focusing on human development indicators such as health,
education, and economics.
23. DBpedia
Link: DBpedia
Overview: DBpedia extracts structured content from Wikipedia
and makes it available as a dataset for projects requiring
semantic data.
Pooja Pawar
24. CMU StatLib
Link: CMU StatLib
Overview: Carnegie Mellon University's StatLib provides a
collection of datasets useful for statistical analysis and machine
learning.
25. OpenML
Link: OpenML
Overview: OpenML is a platform where you can explore, upload,
and share machine learning datasets, particularly useful for
benchmarking algorithms and comparing results.
26. DataHub
Link: DataHub
Overview: DataHub offers a collection of high-quality datasets
across various domains such as finance, health, and education.
Pooja Pawar
27. Humanitarian Data Exchange (HDX)
Link: HDX
Overview: The Humanitarian Data Exchange offers datasets
relevant to humanitarian crises, including food security,
refugees, and health.
28. The Global Health Observatory (GHO)
Link: Global Health Observatory
Overview: Provided by the World Health Organization, GHO
offers global health statistics, including mortality rates, health
systems, and diseases.
29. NOAA (National Oceanic and Atmospheric Administration)
Link: NOAA Datasets
Overview: NOAA provides a wealth of environmental data,
including climate, ocean, and atmospheric data for scientific and
analysis purposes.
Pooja Pawar
30. NASA Open Data
Link: NASA Open Data
Overview: NASA offers datasets related to space, earth sciences,
and astronomy, which can be used for analysis or research
projects.
31. Public Tableau Datasets
Link: Tableau Public
Overview: Tableau Public allows users to explore visualizations
and datasets across various domains like healthcare, finance,
and education, which can be downloaded for analysis.
32. Statista
Link: Statista
Overview: Though Statista is a paid service, it offers free access
to many datasets across various industries, including marketing,
consumer behavior, and technology.
Pooja Pawar
33. Inside Airbnb
Link: Inside Airbnb
Overview: Provides data about Airbnb listings in various cities
around the world, which is useful for urban planning, real estate
analysis, and tourism studies.
34. Public Data Sets on Azure
Link: Azure Public Datasets
Overview: Microsoft Azure hosts a collection of open datasets
for machine learning and AI model training across various
domains like healthcare, financial forecasting, and
transportation.
Pooja Pawar
35. Zillow Real Estate Data
Link: Zillow
Overview: Zillow provides real estate datasets that are useful for
analyzing housing trends, home values, and market dynamics in
the U.S.
36. re3data (Registry of Research Data Repositories)
Link: re3data
Overview: re3data is a global registry that provides access to
research data repositories across all academic disciplines.
37. The World Trade Organization (WTO)
Link: WTO Data
Overview: The WTO offers trade and tariff data between
countries, including information on global trade flows, tariffs,
and market access.
Pooja Pawar
38. DataSF
Link: DataSF
Overview: A portal providing open datasets from the city of San
Francisco, useful for urban planning, environmental studies,
transportation analysis, and more.
39. Indian Government Open Data
Link: Data.gov.in
Overview: The Indian government's open data platform provides
data on health, agriculture, education, and transportation,
specifically for India.
40. Yelp Open Dataset
Link: Yelp Dataset
Overview: Yelp provides a dataset for academic purposes,
containing reviews, business information, and user ratings,
Pooja Pawar
which is widely used for sentiment analysis and natural language
processing.
These sources collectively offer a wide range of data for projects across
many fields, including finance, healthcare, real estate, government
policy, climate, and social sciences.
Pooja Pawar