-
Notifications
You must be signed in to change notification settings - Fork 114
Description
Describe the problem
As part of the Cal-ITP data pipeline, we collect and validate GTFS and GTFS-RT data over time, which includes executing the Schedule and RT validators against hundreds of feeds daily. Sometimes we need to re-process old data for a variety of purposes such as changing metadata or updating validator versions; usually the best way is to just run a full historical re-processing starting from the raw data. While this is fine for most situations, it does mean that sometimes we are validating data much later than it was originally collected, which affects validation checks that rely on the validator execution time. (The same situation can occur simply due to retries or latency in our batch processing data pipeline.)
I believe currently there are two affected checks: feed_expiration_date_7_days and feed_expiration_date_30_days.
Proposed solution
The validator accepts a timestamp argument to use when determining whether a feed has expired, etc. This would allow data pipelines to be more idempotent and less reliant on actual validator execution time.
Alternatives you've considered
No response
Additional context
It looks like we have an open PR to add additional checks that would be affected by this.
Deferring to @themightychris and @nkkl but I think we (Jarvus) can look at helping with this after the MVP web validator is deployed.