Skip to content

feat: Show python types in ValidationError messages#3735

Merged
dangotbanned merged 3 commits intomainfrom
validation-py-types
Jan 2, 2025
Merged

feat: Show python types in ValidationError messages#3735
dangotbanned merged 3 commits intomainfrom
validation-py-types

Conversation

@dangotbanned
Copy link
Copy Markdown
Member

@dangotbanned dangotbanned commented Jan 2, 2025

Closes #2914

Description

This PR adapts the logic from the code generation, for use in SchemaValidationError messages.

sort_type_reprs

altair/tools/schemapi/utils.py

Lines 1106 to 1146 in 48e976e

def sort_type_reprs(tps: Iterable[str], /) -> list[str]:
"""
Shorter types are usually the more relevant ones, e.g. `str` instead of `SchemaBase`.
We use `set`_ for unique elements, but the lack of ordering requires additional sorts:
- If types have same length names, order would still be non-deterministic
- Hence, we sort as well by type name as a tie-breaker, see `sort-stability`_.
- Using ``str.lower`` gives priority to `builtins`_.
- Lower priority is given to generated aliases from ``TypeAliasTracer``.
- These are purely to improve autocompletion
- ``None`` will always appear last.
Related
-------
- https://github.com/vega/altair/pull/3573#discussion_r1747121600
Examples
--------
>>> sort_type_reprs(["float", "None", "bool", "Chart", "float", "bool", "Chart", "str"])
['str', 'bool', 'float', 'Chart', 'None']
>>> sort_type_reprs(("None", "int", "Literal[5]", "int", "float"))
['int', 'float', 'Literal[5]', 'None']
>>> sort_type_reprs({"date", "int", "str", "datetime", "Date"})
['int', 'str', 'date', 'datetime', 'Date']
.. _set:
https://docs.python.org/3/tutorial/datastructures.html#sets
.. _sort-stability:
https://docs.python.org/3/howto/sorting.html#sort-stability-and-complex-sorts
.. _builtins:
https://docs.python.org/3/library/functions.html
"""
dedup = tps if isinstance(tps, set) else set(tps)
it = sorted(dedup, key=str.lower) # Quinary sort
it = sorted(it, key=len) # Quaternary sort
it = sorted(it, key=TypeAliasTracer.is_cached) # Tertiary sort
it = sorted(it, key=is_not_stdlib) # Secondary sort
it = sorted(it, key=is_none) # Primary sort
return it

SchemaInfo.to_type_repr

def to_type_repr( # noqa: C901
self,
*,
as_str: bool = True,
target: TargetType = "doc",
use_concrete: bool = False,
use_undefined: bool = False,
) -> str | list[str]:
"""
Return the python type representation of ``SchemaInfo``.
Includes `altair` classes, standard `python` types, etc.
Parameters
----------
as_str
Return as a string.
Should only be ``False`` during internal recursive calls.
target: {"annotation", "doc"}
Where the representation will be used.
use_concrete
Avoid base classes/wrappers that don't provide type info.
use_undefined
Wrap the result in ``altair.typing.Optional``.
"""
tps: set[str] = set()
FOR_TYPE_HINTS: bool = target == "annotation"
if self.title:
if target == "annotation":
tps.update(self.title_to_type_reprs(use_concrete=use_concrete))
elif target == "doc":
tps.add(rst_syntax_for_class(self.title))
if self.is_empty():
tps.add("Any")
elif self.is_literal():
tp_str = spell_literal(self.literal)
if FOR_TYPE_HINTS:
tp_str = TypeAliasTracer.add_literal(self, tp_str, replace=True)
tps.add(tp_str)
elif FOR_TYPE_HINTS and self.is_union_literal():
it: Iterator[str] = chain.from_iterable(el.literal for el in self.anyOf)
tp_str = TypeAliasTracer.add_literal(self, spell_literal(it), replace=True)
tps.add(tp_str)
elif self.is_anyOf():
it_nest = (
s.to_type_repr(target=target, as_str=False, use_concrete=use_concrete)
for s in self.anyOf
)
tps.update(maybe_rewrap_literal(chain.from_iterable(it_nest)))
elif FOR_TYPE_HINTS and self.is_type_alias_union():
it = (
SchemaInfo(dict(self.schema, type=tp)).to_type_repr(
target=target, use_concrete=use_concrete
)
for tp in self.type
)
tps.add(TypeAliasTracer.add_union(self, it, replace=True))
elif isinstance(self.type, list):
# We always use title if possible for nested objects
tps.update(
SchemaInfo(dict(self.schema, type=tp)).to_type_repr(
target=target, use_concrete=use_concrete
)
for tp in self.type
)
elif self.is_array():
tps.add(
spell_nested_sequence(self, target=target, use_concrete=use_concrete)
)
elif self.type in jsonschema_to_python_types:
if self.is_object() and use_concrete:
... # HACK: Fall-through case to avoid `dict` added to `TypedDict`
elif self.is_object() and target == "doc":
tps.add("dict")
else:
tps.add(jsonschema_to_python_types[self.type])
else:
msg = "No Python type representation available for this schema"
raise ValueError(msg)
if use_concrete:
if tps >= {"ColorHex", TypeAliasTracer.fmt.format("ColorName"), "str"}:
# HACK: Remove regular `str` if HEX & CSS color codes are present as well
tps.discard("str")
elif len(tps) == 0 and as_str:
# HACK: There is a single case that ends up empty here
# See: https://github.com/vega/altair/pull/3536#discussion_r1714344162
tps = {"Map"}
return (
finalize_type_reprs(tps, target=target, use_undefined=use_undefined)
if as_str
else sort_type_reprs(tps)
)

Examples

from vega_datasets import data
import altair as alt

>>> alt.Chart(data.barley()).mark_bar().encode(
...     x=alt.X("variety"),
...     y=alt.Y("sum(yield)", stack="null"),  # should be eg. stack=None
... )

SchemaValidationError: 'null' is an invalid value for `stack`. Valid values are:

- One of ['zero', 'center', 'normalize']
- Of type `bool | None`
>>> alt.Chart().encode(alt.Angle().sort("invalid_value"))

SchemaValidationError: 'invalid_value' is an invalid value for `sort`. Valid values are:

- One of ['ascending', 'descending']
- One of ['x', 'y', 'color', 'fill', 'stroke', 'strokeWidth', 'size', 'shape', 'fillOpacity', 'strokeOpacity', 'opacity', 'text']
- One of ['-x', '-y', '-color', '-fill', '-stroke', '-strokeWidth', '-size', '-shape', '-fillOpacity', '-strokeOpacity', '-opacity', '-text']
- Of type `Sequence | Mapping[str, Any] | None`
>>> alt.Chart(data.cars()).mark_text().encode(alt.Text("Horsepower:N", bandPosition="4"))

SchemaValidationError: '4' is an invalid value for `bandPosition`. Valid values are of type `float`.
>>> alt.Chart(data.cars()).mark_point().encode(
...     x="Acceleration:Q",
...     y="Horsepower:Q",
...     color=alt.value(1),  # should be eg. alt.value('red')
... )

SchemaValidationError: '1' is an invalid value for `value`. Valid values are of type `str | Mapping[str, Any] | None`.

Future Work

This could be extended further by expanding on what is allowed within a Sequence.
I'm stopping short of that in this PR - as it is quite a bit more complex and would require restructuring the format of the entire message.

For now, we've just got basic 1:1 replacements & "object": "Mapping[str, Any]" where the key type is enforced by https://json-schema.org/understanding-json-schema/reference/object

_JS_TO_PY: ClassVar[Mapping[str, str]] = {
"boolean": "bool",
"integer": "int",
"number": "float",
"string": "str",
"null": "None",
"object": "Mapping[str, Any]",
"array": "Sequence",

@dangotbanned dangotbanned marked this pull request as ready for review January 2, 2025 14:15
@dangotbanned dangotbanned merged commit 582a364 into main Jan 2, 2025
@dangotbanned dangotbanned deleted the validation-py-types branch January 2, 2025 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Show python types instead of javascript types in error messages

1 participant