Skip to content
This repository was archived by the owner on May 21, 2024. It is now read-only.
This repository was archived by the owner on May 21, 2024. It is now read-only.

Proposed Best Practice: Update practice for publishing future service #48

@e-lo

Description

@e-lo

📝 This falls under the category of: we (Cal-ITP) know this is an issue where there is great confusion and many practices but don't 100% know what the best practice should be....although I can verify that the current published best-practice is out-of-date and insufficient.

Current Relevant Best Practice

One GTFS dataset should contain current and upcoming service (sometimes called a “merged” dataset). Google transitfeed tool's merge function can be used to create a merged dataset from two different GTFS feeds.

  • At any time, the published GTFS dataset should be valid for at least the next 7 days, and ideally for as long as the operator is confident that the schedule will continue to be operated.
  • If possible, the GTFS dataset should cover at least the next 30 days of service.

Needs

  1. Reference currently relevant tools: the transitfeeds library has (pretty much, mostly) been deprecated and we shouldn't have a best practice which references it or encourages agencies to use it to solve this issue.
  2. Accommodate large feeds: Some agencies with very large and complex feeds have been asked by some GTFS consumers to split their feed in two because of its size - just for a single service_id. Adding multiple service_ids would likely overwhelm their system. Additionally, many feeds hosted using github infrastructure require files < 100MB and adding duplicate service_ids would surpass this.
  3. Don't risk validation of currently-posted feed: One of the main reasons for posting future service in advance is to make sure that any issues/problems with the future service data is identified in advance. Posting (relatively) untested data within the same feed risks rendering the whole feed invalid. While this might not be as big of an issue for the large consumers who consume the data on a specific schedule, it isn't friendly to the accessibility of the feed overall. This could also be especially important as agencies work on adding new features/attributes to their feed or update their business processes for dataset production.

Solutions Considered

Continue to suggest a merged feed with difference service_id s for:

  • datasets which don't add new fields/features or dramatically new services, either of which might risk validation of the dataset containing current service.
  • small/medium feeds which would produce merged datasets < XXX Mb ? (not sure what this number should be?).

? Are there current/relevant tools for merging datasets with all current attributes/files?

Otherwise, we've suggested the use of a separate permalink URL for future service. For example Los Angeles Metro publishes their “future-service” feed to a different Git “branch” enabling a permalink download at https://gitlab.com/LACMTA/gtfs_bus/-/blob/future-service/gtfs_bus.zip


Thoughts here?

@gcamp @Cristhian-HA @scmcca @westontrillium and others!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions