0% found this document useful (0 votes)
19 views3 pages

Open Data

This document summarizes different sources for open and proprietary datasets. It outlines several categories of open datasets including government, financial, crime, health, academic, and other general data sources. Proprietary datasets contain sensitive data owned by individuals or organizations and have licensing restrictions. The document concludes by describing 12 common dataset license types that determine how datasets can be used and shared.

Uploaded by

kaan.eroglu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views3 pages

Open Data

This document summarizes different sources for open and proprietary datasets. It outlines several categories of open datasets including government, financial, crime, health, academic, and other general data sources. Proprietary datasets contain sensitive data owned by individuals or organizations and have licensing restrictions. The document concludes by describing 12 common dataset license types that determine how datasets can be used and shared.

Uploaded by

kaan.eroglu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

7.02.

2024 02:04 about:blank

Reading: Additional Sources of Datasets


Estimated time: 5 mins

In this reading, you will learn about:

Open datasets and sources


Proprietary datasets and sources
Dataset license

Open datasets and sources

In this data-driven world, some datasets are freely available for anyone to access, use, modify, and share.
These are called open datasets.
Open datasets include a public license and are very useful for your journey as a Data Scientist. Some of the
most informative open dataset sources are listed below.

Government Data:

https://www.data.gov/
https://www.census.gov/data.html
https://data.gov.uk/
https://www.opendatanetwork.com/
https://data.un.org/

Financial Data Sources:

https://data.worldbank.org/
https://www.globalfinancialdata.com/
https://comtrade.un.org/
https://www.nber.org/
https://fred.stlouisfed.org/

Crime Data:

https://www.fbi.gov/services/cjis/ucr
https://www.icpsr.umich.edu/icpsrweb/content/NACJD/index.html
https://www.drugabuse.gov/related-topics/trends-statistics
https://www.unodc.org/unodc/en/data-and-analysis/

Health Data:

https://www.who.int/gho/database/en/
https://www.fda.gov/Food/default.htm
https://seer.cancer.gov/faststats/selections.php?series=cancer
https://www.opensciencedatacloud.org/
https://pds.nasa.gov/
https://earthdata.nasa.gov/
https://www.sgim.org/communities/research/dataset-compendium/public-datasets-topic-grid

Academic and Business Data:

https://scholar.google.com/
about:blank 1/3
7.02.2024 02:04 about:blank

https://nces.ed.gov/
https://www.glassdoor.com/research/
https://www.yelp.com/dataset

Other General Data:

https://www.kaggle.com/datasets
https://www.reddit.com/r/datasets/

Propriety datasets and sources


Proprietary datasets contain data primarily owned and controlled by specific individuals or organizations.
This data is limited in distribution because it is sold with a licensing agreement.
Some data from private sources cannot be easily disclosed, like public data.

National security data, geological, geophysical, and biological data are examples of propriety data. Copyright
laws or patents usually bind this type of data. Proprietary datasets that mainly contain sensitive information
are less widely available than open datasets.

Some standard propriety dataset sources are listed below.

Health Care:

https://www.sgim.org/communities/research/dataset-compendium/proprietary-datasets

Financial Market data:

https://datarade.ai/data-categories/proprietary-market-data

Google Cloud based datasets:

https://cloud.google.com/datasets

Dataset licenses

When you select a dataset, it is necessary to look into the license. A license explains whether you can use that
dataset or not; or explains if you have to accept certain guidelines to use that dataset. The different license
types are listed below.

1. PUBLIC DOMAIN MARK - PUBLIC DOMAIN


When a dataset has a Public Domain license, all the rights to use, access, modify and share the dataset
are open to everyone. Here there is technically no license.

2. OPEN DATA COMMONS PUBLIC DOMAIN DEDICATION AND LICENSE – PDDL


Open Data Commons license has the same features as the Public Domain license, but the difference is
the PDDL license uses a licensing mechanism to give the rights to the dataset.

3. CREATIVE COMMONS ATTRIBUTION 4.0 INTERNATIONAL CC-BY


This license allows users to share and modify a dataset, but only if they give credit to the creator(s) of
the dataset.

4. COMMUNITY DATA LICENSE AGREEMENT – CDLA PERMISSIVE-2.0


Like most open-source licenses, this license allows users to use, modify, adapt, and share the dataset,
but only if a disclaimer of warranties and liability is also included.

5. OPEN DATA COMMONS ATTRIBUTION LICENSE - ODC-BY


This license allows users to share and adapt a dataset, but only if they give credit to the creator(s) of
the dataset.

about:blank 2/3
7.02.2024 02:04 about:blank

6. CREATIVE COMMONS ATTRIBUTION-SHAREALIKE 4.0 INTERNATIONAL - CC-BY-SA


This license allows users to use, share, and adapt a dataset, but only if they give credit to the dataset
and show any changes or transformations, they made to the dataset. Users might not want to use this
license because they have to share the work they did on the dataset.

7. COMMUNITY DATA LICENSE AGREEMENT – CDLA-SHARING-1.0


This license uses the principle of ‘copyleft’: users can use, modify, and adapt a dataset, but only if they
don’t add license restrictions on the new work(s) they create with the dataset.

8. OPEN DATA COMMONS OPEN DATABASE LICENSE - ODC-ODBL


This license allows users to use, share, and adapt a dataset but only if they give credit to the dataset and
show any changes or transformations they make to the dataset. Users might not want to use this license
because they have to share the work they did on the dataset.

9. CREATIVE COMMONS ATTRIBUTION-NONCOMMERCIAL 4.0 INTERNATIONAL - CC


BY-NC
This license is a restrictive license. Users can share and adapt a dataset, provided they give credit to its
creator(s) and ensure that the dataset is not used for any commercial purpose.

10. CREATIVE COMMONS ATTRIBUTION-NO DERIVATIVES 4.0 INTERNATIONAL - CC BY-


ND
This license is also a restrictive license. Users can share a dataset if they give credit to its creator(s).
This license does not allow additions, transformations, or changes to the dataset.

11. CREATIVE COMMONS ATTRIBUTION-NONCOMMERCIAL-SHAREALIKE 4.0


INTERNATIONAL - CC BY-NC-SA
This license allows users to share a dataset only if they give credit to its creator(s). Users can share
additions, transformations, or changes to the dataset, but they cannot use the dataset for commercial
purposes.

12. CREATIVE COMMONS ATTRIBUTION-NONCOMMERCIAL-NODERIVATIVES 4.0


INTERNATIONAL - CC BY-NC-ND
This license allows users to share a dataset only if they give credit to its creator(s). Users are not
allowed to modify the dataset and are not allowed to use it for commercial purposes.

Note: Additional license types exist. Any dataset you use will include details about its license.

Author(s)
Lakshmi Holla

Other Contributor(s)
Malika Singla

Changelog
Date Version Changed by Change Description
2022-12-14 0.1 Lakshmi Holla Initial version created

about:blank 3/3

You might also like