Skip to content

Proposal: enable direct deployment of validator as a web service #1184

@themightychris

Description

@themightychris

Feature request

Background

  • Organizations want to make use of the canonical validator in custom pipelines and user interfaces, but often lack Java expertise
  • MobilityData may want to host a public web-based validation user interface in the future
  • The dominant pattern for operationalizing scalable computing resources today is to publish a Docker container that exposes an HTTP service that can be pooled behind a load balancer and be horizontally scaled as needed (i.e. turn up and down number of replicas available in the pool)

Proposed solution

A relatively small amount of work could significantly improve the utility of the gtfs-validator repository:

  • Merge feat(ci): handle gradle build in Dockerfile #1120, bringing the Docker container image published to ghcr.io in line with Docker best practices
  • Integrate an optional HTTP server into the validator jar (and standard Docker container image published via ghcr.io), potentially based on the work in feat: html output, web front end #1088 but excluding the frontend code and HTML responses, exposing two JSON API endpoints when the Docker container is invoked in the form docker run ghcr.io/mobilitydata/gtfs-validator:v3.1.0 --web_server:
    • GET /ready: returns HTTP status 200 if the HTTP service is available and ready to accept validation requests
    • POST /validate: accepts a GTFS zip file and returns a JSON report
  • Add an openapi.yaml file to the root of the repository documenting the two available HTTP endpoints in OpenAPI 3.0 format
  • Provide an example Docker command line for running the ghcr.io-published Docker container image as a web service.
    • This provides a baseline reference that deployers can translate to any container-based deployment system
  • Publish a usable Helm chart for deploying the validator as a Kubernetes deployment with horizontal pod autoscaling configuring
    • This provides a more full-features reference for a scalable deployment that can be point-and-click deployed to any managed provider for those using Kubernetes

Key outcomes

Organizations operationalizing the canonical validator as part of their infrastructure can:

  • utilize docker run ghcr.io/mobilitydata/gtfs-validator:v3.1.0 --web_server as a well-defined and stable runtime semantic
  • utilize the HTTP API documented in openapi.yaml as a well-defined and stable contract with internal services utilizing the validator
  • quickly increment the validator version number deployed as MobilityData publishes new versions

Out of scope

This proposal does not suggest implementing within the validator repository:

  • a web-based user interface (but this work would be an enabling step for various paths to deploying a web-based user interface in the future)
  • any sort of access control (common means for deploying HTTP web services all provide for addressing that at a higher level e.g. via private networking or reverse proxies)
  • any sort of performance optimization (horizontal container scaling can serve as a kludge while potential performance improvements are pursued in parallel)

Alternatives

  • Organizations can build their own web services that use the Java API's to bind with the validator's JAR file
    • Many organizations lack Java development expertise necessarily for leveraging Java interfaces directly
  • Organizations can build workflows that invoke the CLI and then collect its output files
    • This approach requires users to build a lot of repetitive glue code to invoke the validator within a network services environment and to make the validator available as a scalable computing service
    • The current semantics do not conform to the input/output semantics of POSIX—instead of executing the validator and then reading the results from persistent storage, it accepting STDIN and returning results over STDOUT would be preferable for use in custom pipelines
  • Implement a web service outside the validator repository
    • This is a feasible alternative, but it would be ideal if the officially maintained ghcr.io container image was ready for direct deployment to modern container-based infrastructure like Kubernetes or Cloud Run. This would help consumers stay up to date with the latest validator by enabling them to move between validator versions seamlessly by having the first-party artifact deployed directly into their infrastructure behind a simple and well-known network HTTP API

Implementation

If this approach is acceptable to the maintainers, @JarvusInnovations will (funder approval pending) provide a ready-to-merge PR implementing all of the above

Metadata

Metadata

Assignees

Labels

enhancementNew feature request or improvement on an existing featureepicUsed for bigger pieces of work. Usually split between several issues.status: Work in progressA PR that would close this issue has been opened.

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions