Skip to content

Relates to #14. Add GTFS Time type validation#21

Merged
6 commits merged intomasterfrom
time-gtfs-type
Dec 5, 2019
Merged

Relates to #14. Add GTFS Time type validation#21
6 commits merged intomasterfrom
time-gtfs-type

Conversation

@ghost
Copy link
Copy Markdown

@ghost ghost commented Dec 2, 2019

Summary:

Add validation for Time GTFS type as defined in specification https://gtfs.org/reference/static/#field-types .
Relates to #14

Expected behavior:
image

Please make sure these boxes are checked before submitting your pull request - thanks!

  • Run the unit tests with gradle test to make sure you didn't break anything
  • Format the title like "Fix #<issue_number> - " (for example - "Fix Bug: Running using documented Docker documentation fails #1111 - Check for null value before using field")
  • Linked all relevant issues
  • Include screenshot(s) showing how this pull request works and fixes the issue(s)

Fabrice V added 2 commits November 29, 2019 15:32
-for Float and Integer types. Including range validation.
-refactor String validation to expose only GTFS types to client code (id and text)
-add validation code
  -specified format HH:MM:SS or H:MM:SS
-add unit testing of method
@ghost ghost requested a review from barbeau December 2, 2019 22:05
@ghost ghost self-assigned this Dec 2, 2019
@ghost
Copy link
Copy Markdown
Author

ghost commented Dec 2, 2019

@barbeau I think there are 2 commits because I didn't branch correctly locally. Sorry about that. This PR regards only changes in a306d2d

Copy link
Copy Markdown
Member

@barbeau barbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM (looking only at a306d2d), a few comments in-line.

@ghost
Copy link
Copy Markdown
Author

ghost commented Dec 3, 2019

Invalid field value should be Invalid time field value to differentiate it from other type rules - same for other type rules.

Errors are differentiated in the errorDescription field with the title always being the same (title = generic, description = specific). Do you think title should also be made specific ?

@ghost
Copy link
Copy Markdown
Author

ghost commented Dec 3, 2019

I think this allows for HHH:MM:SS? The GTFS spec only specifies HH:MM:SS, so I'd limit to this:
https://github.com/google/transit/blob/master/gtfs/spec/en/reference.md#field-types

Added to the test suite, seems to correctly reject HHH:MM:SS as is

@ghost
Copy link
Copy Markdown
Author

ghost commented Dec 3, 2019

I would suggest a separate warning for flagging suspicious times. For example, very rarely will 99:59:59 actually be a valid value in stop_times.txt - it would need to be a four day trip. We can't flag these as errors as they could be correct in outlier cases, but we should flag as warnings so bad times more visible.

Yep, I thought as well. Could you guide me as what would be a sensible cutoff value to make the distinction ? something like over 26:59:59 being suspicious ?

-add test to check for HHH:MM:SS rejection
@barbeau
Copy link
Copy Markdown
Member

barbeau commented Dec 3, 2019

Errors are differentiated in the errorDescription field with the title always being the same (title = generic, description = specific). Do you think title should also be made specific ?

Yes, I'd make the title specific too. It makes it much easier to understand what the rule is when scanning a table like https://github.com/CUTR-at-USF/gtfs-realtime-validator/blob/master/RULES.md#table-of-errors.

@barbeau
Copy link
Copy Markdown
Member

barbeau commented Dec 3, 2019

Yep, I thought as well. Could you guide me as what would be a sensible cutoff value to make the distinction ? something like over 26:59:59 being suspicious ?

Choosing the threshold is the hard part :). When possible we tried to base the thresholds in GTFS-rt validator on real data. So the scientific way to do this would be to run a query against https://openmobilitydata.org/ and find the 95th percentile of times, and use this. You could ask Leo if it's ok to reach out to Matt at TransitScreen to ask if they could run this type of query (Drew at Interline for TransitLand would be the other option). Outside of this, though, you could post a question to GitHub google/transit, or the transit-developers group. My personal guess is that 26:59:59 seems reasonable. The problem with asking, though, is that everyone will have their own opinions and then you have to pick one, invalidating everyone else's :).

For any such thresholds, I would also suggest that we consider making them user-configurable via command-line parameters.

@ghost
Copy link
Copy Markdown
Author

ghost commented Dec 3, 2019

Yep, I thought as well. Could you guide me as what would be a sensible cutoff value to make the distinction ? something like over 26:59:59 being suspicious ?

Choosing the threshold is the hard part :). When possible we tried to base the thresholds in GTFS-rt validator on real data. So the scientific way to do this would be to run a query against https://openmobilitydata.org/ and find the 95th percentile of times, and use this. You could ask Leo if it's ok to reach out to Matt at TransitScreen to ask if they could run this type of query (Drew at Interline for TransitLand would be the other option). Outside of this, though, you could post a question to GitHub google/transit, or the transit-developers group. My personal guess is that 26:59:59 seems reasonable. The problem with asking, though, is that everyone will have their own opinions and then you have to pick one, invalidating everyone else's :).

For any such thresholds, I would also suggest that we consider making them user-configurable via command-line parameters.

hey @LeoFrachet , what do you think would be a sensible approach to defining the threshold value ?

Fabrice V added 3 commits December 3, 2019 18:00
# Conflicts:
#	src/main/java/org/mobilitydata/gtfsvalidator/conversion/CSVtoProtoConverter.java
#	src/main/java/org/mobilitydata/gtfsvalidator/rules/ValidationRules.java
#	src/main/java/org/mobilitydata/gtfsvalidator/util/GTFSTypeValidationUtils.java
#	src/test/java/org/mobilitydata/gtfsvalidator/util/GTFSTypeValidationUtilsTest.java
-required changes after merging PR #19 to master
-add check for suspicious value leading to a warning
  -default set at >= 27:00:00
@ghost
Copy link
Copy Markdown
Author

ghost commented Dec 5, 2019

Merging this with a '27:00:00' warning threshold while opening a feature issue to fine tune the value

@ghost ghost merged commit 1c847f1 into master Dec 5, 2019
@ghost ghost deleted the time-gtfs-type branch December 5, 2019 16:11
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant