Skip to content

Allow the validator to accept a "time of validation" argument #1292

@atvaccaro

Description

@atvaccaro

Describe the problem

As part of the Cal-ITP data pipeline, we collect and validate GTFS and GTFS-RT data over time, which includes executing the Schedule and RT validators against hundreds of feeds daily. Sometimes we need to re-process old data for a variety of purposes such as changing metadata or updating validator versions; usually the best way is to just run a full historical re-processing starting from the raw data. While this is fine for most situations, it does mean that sometimes we are validating data much later than it was originally collected, which affects validation checks that rely on the validator execution time. (The same situation can occur simply due to retries or latency in our batch processing data pipeline.)

I believe currently there are two affected checks: feed_expiration_date_7_days and feed_expiration_date_30_days.

Proposed solution

The validator accepts a timestamp argument to use when determining whether a feed has expired, etc. This would allow data pipelines to be more idempotent and less reliant on actual validator execution time.

Alternatives you've considered

No response

Additional context

It looks like we have an open PR to add additional checks that would be affected by this.

Deferring to @themightychris and @nkkl but I think we (Jarvus) can look at helping with this after the MVP web validator is deployed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature request or improvement on an existing featurestatus: Work in progressA PR that would close this issue has been opened.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions