test: update test_datasets.py by mattijn · Pull Request #3857 · vega/altair

mattijn · 2025-07-15T22:24:22Z

This PR improves the datasets test suite based on feedback in #3854. Now, the automated tests reflect the implementation, solid work. Namely:

Fixed test failures: corrected the regex pattern in test_all_datasets to properly match error messages (the test was looking for icon_7zip.png but the actual error shows 7zip.png).
Added data API tests: integrated comprehensive tests for the new data accessor API into the main test file.
Streamlined test coverage: removed redundant test is_url function that validates with vega-datasets 3.2.0 solely precomputed URLs.
Maintained core functionality: preserved all essential tests for backend functionality, error handling, caching, and data parsing.

The test suite now provides better coverage and the tests are reflecting by their names. No changes to the code in altair were required.

Could you have a look if this addresses your concerns @dangotbanned?

Close #3854

dangotbanned · 2025-07-16T10:26:50Z

Thanks @mattijn, I've only skimmed through but this is looking better! 🎉

Note

I'll aim to do a proper review soon

One thing that stands out to me on a first look:

No changes to the code in altair were required.

Apologies for not being clearer, I'm absolutely fine with further changes being made in both altair and tools.

`test_pandas_date_parse`

I think that at least (#3854 (comment)) will require some changes outside of the test suite - since that issue seems very related to (vega/vega-datasets#702).

My guess is that because that PR introduced a disconnect between the Resource.name and Resource.path, like below, the way I was previously handling these fields probably needs to be updated in some way:

Before

After (vega/vega-datasets#702)

{"name": "7zip.png", "path": "7zip.png"}
{"name": "annual-precip.json", "path": "annual-precip.json"}
{"name": "flights-200k.arrow", "path": "flights-200k.arrow"}
{"name": "flights-20k.json", "path": "flights-20k.json"}

{"name": "icon_7zip", "path": "7zip.png"}
{"name": "annual_precip", "path": "annual-precip.json"}
{"name": "flights_200k_arrow", "path": "flights-200k.arrow"}
{"name": "flights_20k", "path": "flights-20k.json"}

Important

Would you be able to restore (test_pandas_date_parse), with the only change being the names used in the @pytest.mark.parametrize please?

For reference:

(909e7d0) was the commit that introduced using schema data type information for pandas, in response to @joelostblom's early experience (feat(RFC): Adds altair.datasets #3631 (comment))
(a776e2f) was the last major change to SchemaCache

Also here's the docstring for SchemaCache

altair/altair/datasets/_cache.py

Lines 206 to 220 in 78a57f2

    
           class SchemaCache(CompressedCache["_Dataset", "_FlSchema"]): 
        
               """ 
        
               `json`_, `gzip`_ -based, lazy schema lookup. 
        
               - Primarily benefits ``pandas``, which needs some help identifying **temporal** columns. 
        
               - Utilizes `data package`_ schema types. 
        
               - All methods return falsy containers instead of exceptions 
        
               .. _json: 
        
                   https://docs.python.org/3/library/json.html 
        
               .. _gzip: 
        
                   https://docs.python.org/3/library/gzip.html 
        
               .. _data package: 
        
                   https://github.com/vega/vega-datasets/pull/631 
        
               """

Does that help demonstrate why this is a regression?

I added functionality to solve an issue that came up in the PR
I added a test to make sure the data types were at least all temporal data types
But now they aren't, and the issue is only showing up in pandas - which is where the problem was in the first place 😔

Identifying some of the issues from #3854 (comment)

dangotbanned · 2025-07-17T20:02:59Z

(42236c2)

@mattijn did you generate this?

mattijn · 2025-07-17T20:05:00Z

Yes, but now ruff and or mypy aren't yet happy. Or you don't think we should make pylance happy too?

dangotbanned · 2025-07-17T20:09:44Z

Yes, but now ruff and or mypy aren't yet happy. Or you don't think we should make pylance happy too?

No I mean did you write this yourself?

mattijn · 2025-07-17T20:46:59Z

Not a type expert, let me leave this to you, but the stubs are now derived from the DataPackage and not the other type files.

#3854 (comment), #3857 (comment)

mattijn · 2025-08-22T15:11:33Z

Thanks for reviewing @dangotbanned! I've included one raise using the AltairDatasetsError in case for the PyArrow backend when the date type column is in non-ISO date format, related pyarrow issue: apache/arrow#41488.

This is specifically for the 'stocks' dataset example (date contains values like this:"Jan 1 2000"):

from altair.datasets import data

data.stocks(engine="pyarrow")

AltairDatasetsError: PyArrow cannot parse date format in dataset 'stocks'. This is a known limitation of PyArrow's CSV reader for non-ISO date formats.

update test_datasets.py

7d55370

dangotbanned self-requested a review July 16, 2025 10:52

mattijn added 2 commits July 16, 2025 16:43

fix schema generation

2535669

Merge branch 'main' into improve-tests-datasets

f3a6a0a

mattijn mentioned this pull request Jul 16, 2025

docs: new dataset source, from vega_dataset to altair.dataset #3859

Merged

dangotbanned reviewed Jul 17, 2025

View reviewed changes

Comment thread tools/datasets/datapackage.py

dangotbanned and others added 4 commits July 17, 2025 18:48

Merge branch 'main' into improve-tests-datasets

33976e7

chore(typing): Ignore _data.py warnings

088270d

Identifying some of the issues from #3854 (comment)

chore(typing): Ignore test_datasets.py warnings

49a8d77

stubs to make pylance happy

42236c2

mattijn added 3 commits July 17, 2025 22:20

make ruff and mypy happy

c941748

fix errors generator

eac0f8c

stubs come from datapackage

e226fe2

dangotbanned added 3 commits July 17, 2025 21:05

revert: back to 49a8d77

6b9e9f1

fix: Resolve typing issues, remove excessive comments

c365a84

revert: Fully restore test_pandas_date_parse

ec91a92

#3854 (comment), #3857 (comment)

dangotbanned marked this pull request as draft July 17, 2025 22:15

dangotbanned marked this pull request as ready for review July 22, 2025 11:46

mattijn mentioned this pull request Jul 23, 2025

Incorrect schema definition: stocks dataset date column defined as "string" instead of "date vega/vega-datasets#711

Closed

mattijn added 4 commits August 22, 2025 16:29

bump vega-datasets to v3.2.1

65273b9

more v3.2.1

e03bf3b

raise for pyarrow backend when date type is in non-ISO date format

9f688c5

pass mypy and ruff

0e49857

mattijn merged commit 5e9f60e into main Aug 22, 2025
25 checks passed

mattijn added a commit that referenced this pull request Oct 11, 2025

require min vegafusion 2 and revert #3857

9d37b85

mattijn mentioned this pull request Oct 11, 2025

feat: require minimal vegafusion 2 and revert #3857 #3885

Merged

mattijn added a commit that referenced this pull request Oct 11, 2025

require min vegafusion 2 and revert #3857 (#3885)

346188d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test: update test_datasets.py#3857

test: update test_datasets.py#3857
mattijn merged 17 commits intomainfrom
improve-tests-datasets

mattijn commented Jul 15, 2025 •

edited

Loading

Uh oh!

dangotbanned commented Jul 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

dangotbanned commented Jul 17, 2025

Uh oh!

mattijn commented Jul 17, 2025

Uh oh!

dangotbanned commented Jul 17, 2025

Uh oh!

mattijn commented Jul 17, 2025

Uh oh!

mattijn commented Aug 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

mattijn commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dangotbanned commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

test_pandas_date_parse

Uh oh!

Uh oh!

dangotbanned commented Jul 17, 2025

Uh oh!

mattijn commented Jul 17, 2025

Uh oh!

dangotbanned commented Jul 17, 2025

Uh oh!

mattijn commented Jul 17, 2025

Uh oh!

mattijn commented Aug 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mattijn commented Jul 15, 2025 •

edited

Loading

dangotbanned commented Jul 16, 2025 •

edited

Loading

`test_pandas_date_parse`