Skip to content

Data Validation #1487

@lkstrp

Description

@lkstrp

Ref #1128

The following issues should be addressed

Quote from PR:

Summary

  • Add data validation (via Pydantic for data classes, pandera for dataframes) to better handle attribute types/ defaults/ nullable status, introduce immutable class attributes, allow for attribute specific checks etc.
  • Possibilities for checks are endless, we can move a lot of the docs to validation checks, which could improve user experience by a lot.
  • Removes arbitrary definition when attr is listed in both c.dynamic and c.static
  • Adds mechanism for dynamic attribute initialisation. E.g. output variables are only added when a network is solved which can be extended to "configurable" attributes.

Data Validation

  • Pydantic is used for data classes
    • Currently only for Components class/ subclasses and ComponentType. The plan is to bring it to the Network class as well, when it is split into data and logic with some refactoring.
    • Allows to enforce types, use simpler default factories, mix immutable with mutable attributes etc. This alone can make things more robust, for both users and developers.
  • Pandera is used for data frames
    • It is a library based on pydantic.
    • Defines schemas for DataFrames for both static and dynamic data.
    • Handles types (and casting), missing columns and nullability for each attribute individually.
    • Attribute specific settings are handled in the attribute csvs in pypsa/data/component_attrs/, which has some changed structure now.
  • At the moment the checks are not very strict and the main benefits are type safty and simplified DataFrame initialisation
    • But the structure will allow us to set up tons of attribute-specific checks, which will be much easier than what is currently done in check_consistency.
    • We can raise these checks when adding data, disallow certain attribute combinations, enforce certain ranges or discrete steps, even interdependent, and so on. A lot of the side notes/ explanation in the docs can be turned into instant feedback instead.
    • In a similar way I would like to bring the same data validation steps at some point to pypsa-eur, where benefits could be even bigger.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    Status

    Todo

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions