Skip to content

Relates to #14. Refactor GTFS types validation#19

Merged
4 commits merged intomasterfrom
apache-commons-validator-refactor
Dec 3, 2019
Merged

Relates to #14. Refactor GTFS types validation#19
4 commits merged intomasterfrom
apache-commons-validator-refactor

Conversation

@ghost
Copy link
Copy Markdown

@ghost ghost commented Nov 29, 2019

Summary:

Having added Apache commons validators for URL validation, it seemed logical to do a refactor to use the FloatValidator and IntegerValidator. No change to test suite.
https://commons.apache.org/proper/commons-validator/
Relates to #14

Also included in this PR is a small refactor regarding String validation. validateString been made private and publicly exposed validateId and validateText have been made available. This way, client code directly call the method associated with the expected GTFS type for the considered CSV field.

Expected behavior:

image

Please make sure these boxes are checked before submitting your pull request - thanks!

  • Run the unit tests with gradle test to make sure you didn't break anything
  • Format the title like "Fix #<issue_number> - " (for example - "Fix Bug: Running using documented Docker documentation fails #1111 - Check for null value before using field")
  • Linked all relevant issues
  • Include screenshot(s) showing how this pull request works and fixes the issue(s)

-for Float and Integer types. Including range validation.
-refactor String validation to expose only GTFS types to client code (id and text)
@ghost ghost requested a review from barbeau November 29, 2019 21:08
@ghost ghost self-assigned this Nov 29, 2019
Copy link
Copy Markdown
Member

@barbeau barbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, although a few comments in-line.

Copy link
Copy Markdown
Author

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my reply

Fabrice V added 2 commits December 3, 2019 10:46
-add potential source of Locale to use for parsing, to be fixed later
-occurrence prefix formatted with entity id as well as problematic value
-tests have been changed to reflect that
-rule title and postfix have been clarified
@ghost ghost requested a review from barbeau December 3, 2019 21:19
Copy link
Copy Markdown
Member

@barbeau barbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, although a few comments inline.

}

Float length = GTFSTypeValidationUtils.parseAndValidateFloat("length",
Float length = GTFSTypeValidationUtils.parseAndValidateFloat(pathwayId,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be dependent on how these error messages are surfaced in a UI or managed more generally in code, but I think you'll want to capture the ID field name here as well (in this case, pathway_id). It may not always be obvious which file you're validating.

So it may be hard to immediately decipher:

id: 1234 length is -0.001

..., while this is much easier to immediately understand:

pathway_id: 1234 length is -0.001

I think all IDs in GTFS end with _id now (?), although that's not a required naming convention and probably would be best not to assume it.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I thought field names being quite unique would make it clear enough. I'll make the change.

assertEquals(1, testList.size());
error = testList.get(0);
assertEquals(fieldName, error.getPrefix());
assertEquals("id: " + validatedEntityId + " " + fieldName + " is " + "-0.001", error.getPrefix());
Copy link
Copy Markdown
Member

@barbeau barbeau Dec 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest instead of concatenating the variables in the "expected" parameter, actually hard-coding the entire string with sample values that are very close to real ones. It makes it much easier to read what the expected output is and you're more likely to catch concatenation issues like missing spaces or commas.

So an expected value here could be:

pathway_id: 1234 length is -0.001

You could just test the entire concatenated prefix and suffix as well against one hard-coded string, to make sure the prefix and suffix concatenating correctly too.

IIRC the GTFS-rt validator tests the prefix but not prefix + suffix concatenation - that was on my TODO list but I never got to it. Concatenation errors in the messages (missing/extra spaces, commas, colons) was probably one of the most common issues I caught when writing most of the RT validator tests.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep ok, that makes sense.

-testing against a full hard coded string as opposed to concatenating
-capturing ID field name for occurrence prefix
Copy link
Copy Markdown
Author

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@ghost
Copy link
Copy Markdown
Author

ghost commented Dec 3, 2019

Looks good! Although I'd suggest using real GTFS field names in the tests. Not only does it make it easier to read, it gives a real example of bad data that could occur in the real world.

Many of the GTFS-rt tests were written due to running into bad data in the real world, and we ended up using that real bad data in many of the unit tests.

Ok, opening an improvement issue for that.

@ghost ghost merged commit ce1b8fd into master Dec 3, 2019
@ghost ghost deleted the apache-commons-validator-refactor branch December 3, 2019 22:54
ghost pushed a commit that referenced this pull request Dec 3, 2019
-required changes after merging PR #19 to master
ghost pushed a commit that referenced this pull request Dec 3, 2019
ghost pushed a commit that referenced this pull request Dec 5, 2019
* GTFS types validation. Using Apache commons

-for Float and Integer types. Including range validation.
-refactor String validation to expose only GTFS types to client code (id and text)

* GTFS types validation. time

-add validation code
  -specified format HH:MM:SS or H:MM:SS
-add unit testing of method

* GTFS types validation. time

-add test to check for HHH:MM:SS rejection

* GTFS types validation. time

-required changes after merging PR #19 to master

* GTFS types validation. time

-add check for suspicious value leading to a warning
  -default set at >= 27:00:00
ghost pushed a commit that referenced this pull request Dec 5, 2019
* GTFS types validation. Using Apache commons

-for Float and Integer types. Including range validation.
-refactor String validation to expose only GTFS types to client code (id and text)

* Stops.txt -- GTFS types validation + protobuf conversion

* GTFS types validation. time

-add validation code
  -specified format HH:MM:SS or H:MM:SS
-add unit testing of method

* GTFS types validation. time

-add test to check for HHH:MM:SS rejection

* GTFS types validation. time

-required changes after merging PR #19 to master

* Stops.txt -- GTFS types validation + protobuf conversion

-required changes after merging PR #19
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant