Skip to content

breaking: Rename weather.json -> weekly-weather.json #633

@dangotbanned

Description

@dangotbanned

Following #631, I was surprised to find that we have two very different datasets named "weather".
I'd expected datasets sharing the same base name/stem to represent the same source.
"flights-200k" is the only other duplicated name - but both the .arrow and .json files represent the same data.

I'm thinking ahead towards (vega/altair#3631 (comment)), where there may be datasets with .json and .parquet versions.
In that world, a guarantee on a shared stem representing the same data would provide options to resolve incompatibilities in (vega/altair#3631 (comment))

I do understand renaming would have to be a breaking change, and should not be taken lightly.

Usage

The following uses jsdelivr-stats looking at the range of the past year.

Currently, altair-viz/vega_datasets is by far the greatest source of traffic - which is pinned on v1.29.0.

Versions

Version Requests
v1.29.0 6,802,781
v1.31.1 215,574
v2.7.0 190,102

v1.29.0

There is a weather.csv for this version - but is not accessible using the python api.
"weather" is defined as a reference to weather.json only.

The effect this has is quite apparent when comparing the traffic per-file:

Dataset (v1.29.0) Requests
movies.json 1,804,118
population.json 654,426
... ...
40 most popular 5,000+
weather.json 2,154
weather.csv 31

Given that v1.29.0 is baked into the package - none of this usage would be impacted by the change.

jsdelivr-npm seems to be able to handle version-fallback in the event that anyone is using @latest - so maybe the impact of this change is fairly limited?

Descriptions

weather.json

vega-datasets/SOURCES.md

Lines 403 to 405 in 719c388

## `weather.json`
Instructional dataset showing actual and predicted temperature data.

vega-datasets/datapackage.json

Lines 1207 to 1215 in 719c388

{
"name": "weather",
"type": "json",
"path": "weather.json",
"scheme": "file",
"format": "json",
"mediatype": "text/json",
"encoding": "utf-8"
},

weather.csv

vega-datasets/SOURCES.md

Lines 407 to 409 in 719c388

## `weather.csv`
Data from [NOAA](http://www.ncdc.noaa.gov/cdo-web/datatools/findstation). Transformed using `/scripts/weather.py`. We synthesized the categorical "weather" field from multiple fields in the original dataset. This data is intended for instructional purposes.

vega-datasets/datapackage.json

Lines 1679 to 1718 in 719c388

"name": "weather",
"type": "table",
"path": "weather.csv",
"scheme": "file",
"format": "csv",
"mediatype": "text/csv",
"encoding": "utf-8",
"schema": {
"fields": [
{
"name": "location",
"type": "string"
},
{
"name": "date",
"type": "date"
},
{
"name": "precipitation",
"type": "number"
},
{
"name": "temp_max",
"type": "number"
},
{
"name": "temp_min",
"type": "number"
},
{
"name": "wind",
"type": "number"
},
{
"name": "weather",
"type": "string"
}
]
}
},

Side note

Having a unique name per-resource is part of the spec we haven't met yet.
I'm less interested in that part as we can just include the suffix if needed.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions