Skip to content

Implement #80 - Execute GTFS validation on a subset of files#188

Merged
lionel-nj merged 41 commits intomasterfrom
file-exclusion
May 28, 2020
Merged

Implement #80 - Execute GTFS validation on a subset of files#188
lionel-nj merged 41 commits intomasterfrom
file-exclusion

Conversation

@lionel-nj
Copy link
Copy Markdown
Contributor

@lionel-nj lionel-nj commented May 11, 2020

Summary:

This PR suggests code to execute the semantic GTFS validation on a subset of files.

Expected behavior:

User requires the exclusion of some files via configuration file (.json) or command line option. The use cases should be able to determine all GTFS file that depend on the input (list of files).
The last step is to execute the validation process on the resultant subset of files.

This PR ensures a graceful fail in case user input is not valid: validation is executed on the whole GTFS dataset.

  • use JsonGraph to write a file containing information about the dependencies among GTFS files

  • write a class to read such file and extract from the resulting object the list of files to exclude from the validation process given an input.
    More precisely, if the use requires files routes.txt to not be validated, the use case should determine that since routes.txt is excluded, then files trips.txt, fare_rules.txt, attributions.txt, frequencies.txt, stop_times.txt.

  • check validity of user input

  • write use case GenerateExclusionFilenameList tests

Please make sure these boxes are checked before submitting your pull request - thanks!

  • Run the unit tests with gradle test to make sure you didn't break anything
  • Format the title like "Fix #<issue_number> - " (for example - "Fix Bug: Running using documented Docker documentation fails #1111 - Check for null value before using field")
  • Linked all relevant issues
  • Include screenshot(s) showing how this pull request works and fixes the issue(s)

Running gtfs-validator with the following input, leads to the following information logging
{ "extract": "input", "output": "output", "proto": false, "url": "https://openmobilitydata.org/p/societe-de-transport-de-montreal/39/latest/download", "zipinput": null, "exclude": "shapes.txt" }

Capture d’écran, le 2020-05-26 à 18 48 29

Running gtfs-validator with the following (wrong: shapess.txt) input, leads to the following information logging
{ "extract": "input", "output": "output", "proto": false, "url": "https://openmobilitydata.org/p/societe-de-transport-de-montreal/39/latest/download", "zipinput": null, "exclude": "shapess.txt" }
Capture d’écran, le 2020-05-26 à 18 50 49

Lionel N added 4 commits May 8, 2020 12:20
…ository

- add option "exclude" to available options for future add to repo
- implement case to retrieve value of such option
- add new "exclude" option to default configuration file for parsing
- add new key to ExecParamRepository interface
…m value to String[]

- use casting to retrieve values from ExecParamRepository
- adapt ExecParamRepository interface to said change
- finalize variables
- remove non working type casting in JsonExecParamParser parsing method to enable use of String[]
- rework getValue and getExecParamValue methods of class ExecParam to return a string
- adapt tests and interface to said change
@lionel-nj lionel-nj self-assigned this May 11, 2020
@lionel-nj lionel-nj marked this pull request as draft May 11, 2020 14:50
@lionel-nj lionel-nj requested a review from a user May 11, 2020 14:50
Lionel N added 9 commits May 12, 2020 11:25
… options (except --exclude)

- implement method to check the validity of an option:
  - checks if the option is already defined
  - checks if the option can have multiple values
- use class constants to initialize availableOptions in ApacheExecParamParser instead of hard coding the values
- modify logic to retrieve HELP_KEY or PROTO_KEY value from ExecParamValue: old logic retrieved a wrong value for said keys
- remove blank lines
- adapt tests to last changes: number of methods getLongOpt/getValues calls, and return value of method getExecParamValue on HELP_KEY
… validate

- implement use case determining the list of files to validate (CSV+GTFS)
- rework tests related to said use case
- make use case declarations in config
- initialize lists representing of files:
  - present in the GTFS archive (required + optional) (a)
  - to exclude from validation  (b)
  - to process (a)-(b)
Copy link
Copy Markdown

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See remarks inline

Lionel N added 3 commits May 14, 2020 09:57
- define option in the ExecParam once
- use said definition for ApacheExecParamParser instead of duplicating the definition
- rework tests:
  - adapt tests to said changes
  - use constant defined in ExecParamRepository interface
…tring[]

- adapt ExecParamTest to said change
- adapt InMemoryExecParamRepository to said change
- change default value for "exclude" option in default-execution-parameters.json
- adapt ApacheExecParamParser to handle multiple arguments for same option
  - adapt test to said change
- adapt JsonExecParamParser to handle multiple arguments for same option
- rework GTFS schema
- comment tests
@lionel-nj lionel-nj linked an issue May 20, 2020 that may be closed by this pull request
lionel-nj added 7 commits May 22, 2020 09:45
- make input format of exclusion option a single string
- adapt tests
- implement new class GtfsSchemaTree to deserialize gtfs schema.json
  - write tests to verify class methods
- implement new class GtfsNode to represent nodes of GtfsSchemaTree
  - write tests to verify class methods
- add jackson annotation to handle reference inside gtfs schema.json
- split string on different regex: string should not contain space char
- adapt GenerateExclusionFilenameList use case to handle GtfsSchemaTree structure
- use references inside .json for future parsing
lionel-nj added 2 commits May 25, 2020 15:53
…fsNode

- create new adapter module "tree" used to read schema.json with annotations that allow references inside .json file
  - update settings.gradle
  - implement new class GtfsNodeMaker to deserialize schema.json via jackson
    - update build.gradle for adapter/parser
    - write tests

- implement new class GtfsSchemaParser to go from schema.json to a GtfsNode representation
  - add field gtfsSchema to InMemoryGtfsSpecRepository use said class via interface
    - adapt tests
      - add file test-schema.json to test resources

 - implement new class GtfsNode containing information regarding the GTFS file dependency as a tree
   - implement DFS
   - write tests
… getGtfsDependencySchema method from InMemoryGtfsSpec
Copy link
Copy Markdown

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good progress. Changes we discussed can be found inline

lionel-nj added 7 commits May 26, 2020 09:59
- adapt tests
- remove unnecessary methods
- remove unused default class constructor
- removed unused setters
- adapt code to new definition of commands/options: remove unnecessary method checkExistingOptions
… Use .removeAll method of List interface

- Rework return type of usecases
- write test to verify that exception is thrown
Copy link
Copy Markdown

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See inline remark
As you mentioned, test for exclusion list generation to be completed

lionel-nj added 4 commits May 26, 2020 18:25
- add default constructor to GtfsNode class to enable mocking of class for use in tests
- write said tests
- improve Exception message when user input is not valid
@lionel-nj lionel-nj marked this pull request as ready for review May 26, 2020 22:53
Copy link
Copy Markdown

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. A few remarks inline

lionel-nj added 5 commits May 28, 2020 09:46
…ge and test code

- add javadoc to tests
- modify information logging
  - adapt tests
…ge and test code

- add javadoc to tests
- modify information logging
  - adapt tests
… org.mobilitydata.gtfsvalidator.domain.entity.relationship_descriptor
@lionel-nj
Copy link
Copy Markdown
Contributor Author

@Fabrice-V As discussed, package org.mobilitydata.gtfsvalidator.domain.entity.schema has been renamed to org.mobilitydata.gtfsvalidator.domain.entity.relationship_descriptor in bb37f18

@lionel-nj lionel-nj merged commit 8a37779 into master May 28, 2020
@lionel-nj lionel-nj deleted the file-exclusion branch May 28, 2020 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[REQUEST] Execute GTFS validation on a subset of files

2 participants