Skip to content

feat: frequencies.txt exact_times=0 trips must not have stop_times.timepoint=1 records#887

Closed
lionel-nj wants to merge 3 commits intomasterfrom
new-rule/exact_times_trips_must_not_have_stop_times_timepoint_equals_1_records
Closed

feat: frequencies.txt exact_times=0 trips must not have stop_times.timepoint=1 records#887
lionel-nj wants to merge 3 commits intomasterfrom
new-rule/exact_times_trips_must_not_have_stop_times_timepoint_equals_1_records

Conversation

@lionel-nj
Copy link
Copy Markdown
Contributor

@lionel-nj lionel-nj commented May 3, 2021

closes #823

Summary:

This PR provides support to verify that frequency-based trips do not have stop_times.timepoint=1 records.

New notice:

  • InvalidFrequencyBasedTripNotice

Expected behavior:

An InvalidFrequencyBasedTripNotice should be generated in the following case:

frequencies.txt

trip_id exact_times
t0 0

stop_times.txt

trip_id timepoint
t0 1 - or empty

Other fields combination should not generate notice.

Please make sure these boxes are checked before submitting your pull request - thanks!

lionel-nj added 2 commits May 3, 2021 17:21
- additional unit tests
- update rules documentation
@lionel-nj lionel-nj requested a review from barbeau May 3, 2021 21:40
@lionel-nj lionel-nj self-assigned this May 3, 2021
@aababilov
Copy link
Copy Markdown
Collaborator

There are 303 feeds and 656K trips that run on Google that would be treated invalid and rejected according to the proposed validation rule. 

I cannot agree with the invariant being checked. A trip can be frequency-based, so the exact start of the trip is unknown. However, once the vehicle starts moving, it visits each stop in predictable time since the time difference between each stop is fixed (hence timepoint=1). This is a usual case for metro/subway/underground. So, frequency-based trips may have timepoint=1.
I see that this PR demonstrates some problems that we already have faced several months ago. Those problems are risky for the future of the open-source validator, that is why it is important to find a way to prevent them.

How was that code tested, i.e., what was the amount of the real-world feeds used for testing? It may be the case that the test sample used by MobilityData is not representative and it must be extended with more feeds.

Where did the idea for #823 come from? Was that requested by some GTFS provider or consumer?

@lionel-nj
Copy link
Copy Markdown
Contributor Author

lionel-nj commented May 5, 2021

There are 303 feeds and 656K trips that run on Google that would be treated invalid and rejected according to the proposed validation rule.

Thanks for flagging that! 🙏🏾

How was that code tested, i.e., what was the amount of the real-world feeds used for testing? It may be the case that the test sample used by MobilityData is not representative and it must be extended with more feeds.

This quarter, MobilityData is actively working to set a larger scale testing system (see #848). It is in code review and not ready yet - hence the fact that this PR has not been checked against a large amount of datasets.

I cannot agree with the invariant being checked. A trip can be frequency-based, so the exact start of the trip is unknown. However, once the vehicle starts moving, it visits each stop in predictable time since the time difference between each stop is fixed (hence timepoint=1). This is a usual case for metro/subway/underground. So, frequency-based trips may have timepoint=1.

Thanks for the clarification, @MobilityData/transit-specs @barbeau should we clarify the the GTS specification to help understanding the usage of these fields (stop_times.timepoint and frequencies.exact_time).

@barbeau
Copy link
Copy Markdown
Member

barbeau commented May 5, 2021

There are 303 feeds and 656K trips that run on Google that would be treated invalid and rejected according to the proposed validation rule.

Thanks, that's definitely good to know. I opened #823 for this rule based on discussions I've had with producers that misunderstood what exact_times=0 trips (true frequency-based trips) are.

However, once the vehicle starts moving, it visits each stop in predictable time since the time difference between each stop is fixed (hence timepoint=1).

That makes sense to me as a valid use case (although based on my past discussions with producers I'd be surprised if all 656k trips are actually this valid use case and not a data error).

@MobilityData/transit-specs @barbeau should we clarify the the GTS specification to help understanding the usage of these fields (stop_times.timepoint and frequencies.exact_time).

Yes, I think that's the best solution here (and not to adopt this rule in the validator).

The timepoint field needs clarification anyway (see google/transit#61), and IMHO the concept of exact_times=0 trips also needs better explanation in the spec. Clarifying how the two interact would definitely help.

I can't think of another way in the validator (based on the current spec at least) to differentiate the valid use case from agencies mistakenly assigning wall-clock timepoints to exact_times=0 trips. One approach that would require a spec change is for consumers to start exact_times=0 trips at midnight to clearly differentiate the use of stop_times.txt time records for time offsets rather than wall-clock times.

@lionel-nj
Copy link
Copy Markdown
Contributor Author

Closing for now as we will not implement this invariant - we might revisit this in the future following spec clarification.

@timMillet
Copy link
Copy Markdown

@MobilityData/transit-specs @barbeau should we clarify the the GTS specification to help understanding the usage of these fields (stop_times.timepoint and frequencies.exact_time).

Issue created to keep track https://github.com/MobilityData/gtfs-tasks/issues/119

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Frequencies.txt exact_times=0 trips must not have stop_times.timepoint=1 records

4 participants