Following #631, I was surprised to find that we have two very different datasets named "weather".
I'd expected datasets sharing the same base name/stem to represent the same source.
"flights-200k" is the only other duplicated name - but both the .arrow and .json files represent the same data.
I'm thinking ahead towards (vega/altair#3631 (comment)), where there may be datasets with .json and .parquet versions.
In that world, a guarantee on a shared stem representing the same data would provide options to resolve incompatibilities in (vega/altair#3631 (comment))
I do understand renaming would have to be a breaking change, and should not be taken lightly.
Usage
The following uses jsdelivr-stats looking at the range of the past year.
Currently, altair-viz/vega_datasets is by far the greatest source of traffic - which is pinned on v1.29.0.
Versions
| Version |
Requests |
v1.29.0 |
6,802,781 |
v1.31.1 |
215,574 |
v2.7.0 |
190,102 |
v1.29.0
There is a weather.csv for this version - but is not accessible using the python api.
"weather" is defined as a reference to weather.json only.
The effect this has is quite apparent when comparing the traffic per-file:
Dataset (v1.29.0) |
Requests |
movies.json |
1,804,118 |
population.json |
654,426 |
| ... |
... |
| 40 most popular |
5,000+ |
weather.json |
2,154 |
weather.csv |
31 |
Given that v1.29.0 is baked into the package - none of this usage would be impacted by the change.
jsdelivr-npm seems to be able to handle version-fallback in the event that anyone is using @latest - so maybe the impact of this change is fairly limited?
Descriptions
weather.json
|
## `weather.json` |
|
|
|
Instructional dataset showing actual and predicted temperature data. |
|
{ |
|
"name": "weather", |
|
"type": "json", |
|
"path": "weather.json", |
|
"scheme": "file", |
|
"format": "json", |
|
"mediatype": "text/json", |
|
"encoding": "utf-8" |
|
}, |
weather.csv
|
## `weather.csv` |
|
|
|
Data from [NOAA](http://www.ncdc.noaa.gov/cdo-web/datatools/findstation). Transformed using `/scripts/weather.py`. We synthesized the categorical "weather" field from multiple fields in the original dataset. This data is intended for instructional purposes. |
|
"name": "weather", |
|
"type": "table", |
|
"path": "weather.csv", |
|
"scheme": "file", |
|
"format": "csv", |
|
"mediatype": "text/csv", |
|
"encoding": "utf-8", |
|
"schema": { |
|
"fields": [ |
|
{ |
|
"name": "location", |
|
"type": "string" |
|
}, |
|
{ |
|
"name": "date", |
|
"type": "date" |
|
}, |
|
{ |
|
"name": "precipitation", |
|
"type": "number" |
|
}, |
|
{ |
|
"name": "temp_max", |
|
"type": "number" |
|
}, |
|
{ |
|
"name": "temp_min", |
|
"type": "number" |
|
}, |
|
{ |
|
"name": "wind", |
|
"type": "number" |
|
}, |
|
{ |
|
"name": "weather", |
|
"type": "string" |
|
} |
|
] |
|
} |
|
}, |
Side note
Having a unique name per-resource is part of the spec we haven't met yet.
I'm less interested in that part as we can just include the suffix if needed.
Following #631, I was surprised to find that we have two very different datasets named
"weather".I'd expected datasets sharing the same base name/stem to represent the same source.
"flights-200k"is the only other duplicated name - but both the.arrowand.jsonfiles represent the same data.I'm thinking ahead towards (vega/altair#3631 (comment)), where there may be datasets with
.jsonand.parquetversions.In that world, a guarantee on a shared stem representing the same data would provide options to resolve incompatibilities in (vega/altair#3631 (comment))
I do understand renaming would have to be a breaking change, and should not be taken lightly.
Usage
The following uses jsdelivr-stats looking at the range of the past year.
Currently, altair-viz/vega_datasets is by far the greatest source of traffic - which is pinned on
v1.29.0.Versions
v1.29.0v1.31.1v2.7.0v1.29.0There is a
weather.csvfor this version - but is not accessible using thepythonapi."weather"is defined as a reference toweather.jsononly.The effect this has is quite apparent when comparing the traffic per-file:
v1.29.0)movies.jsonpopulation.jsonweather.jsonweather.csvGiven that
v1.29.0is baked into the package - none of this usage would be impacted by the change.jsdelivr-npm seems to be able to handle version-fallback in the event that anyone is using
@latest- so maybe the impact of this change is fairly limited?Descriptions
weather.jsonvega-datasets/SOURCES.md
Lines 403 to 405 in 719c388
vega-datasets/datapackage.json
Lines 1207 to 1215 in 719c388
weather.csvvega-datasets/SOURCES.md
Lines 407 to 409 in 719c388
vega-datasets/datapackage.json
Lines 1679 to 1718 in 719c388
Side note
Having a unique
nameper-resource is part of the spec we haven't met yet.I'm less interested in that part as we can just include the suffix if needed.