Proposed Best Practice: Update practice for publishing future service

📝 This falls under the category of: *we (Cal-ITP) know this is an issue where there is great confusion and many practices* but don't 100% know what the best practice *should* be....although I can verify that the current published best-practice is out-of-date and insufficient. 

## Current Relevant Best Practice 

> One GTFS dataset should contain current and upcoming service (sometimes called a “merged” dataset). Google transitfeed tool's [merge function](https://github.com/google/transitfeed/wiki/Merge) can be used to create a merged dataset from two different GTFS feeds.
> - At any time, the published GTFS dataset should be valid for at least the next 7 days, and ideally for as long as the operator is confident that the schedule will continue to be operated.
> - If possible, the GTFS dataset should cover at least the next 30 days of service.

## Needs

1. **Reference currently relevant tools**:  the [`transitfeeds`](https://github.com/google/transitfeed) library has (pretty much, mostly) been deprecated and we shouldn't have a best practice which references it or encourages agencies to use it to solve this issue. 
2. **Accommodate large feeds**: Some agencies with very large and complex feeds  have been asked by some GTFS consumers to split their feed in two because of its size - just for a single `service_id`.  Adding multiple `service_id`s  would likely overwhelm their system.  Additionally, many feeds hosted using github infrastructure require files < 100MB and adding duplicate `service_id`s would surpass this.  
3. **Don't risk validation of currently-posted feed**: *One* of the main reasons for posting future service in advance is to make sure that any issues/problems with the future service data is identified in advance.  Posting (relatively) untested data within the same feed risks rendering the whole feed invalid. While this might not be as big of an issue for the large consumers who consume the data on a specific schedule, it isn't friendly to the accessibility of the feed overall.  This could also be especially important as agencies work on adding new features/attributes to their feed or update their business processes for dataset production. 

## Solutions Considered

Continue to suggest a merged feed with difference `service_id` s for:

- datasets which don't add new fields/features or dramatically new services, either of which might risk validation of the dataset containing current service. 
- small/medium feeds which would produce merged datasets < XXX Mb ? (not sure what this number should be?). 

? Are there current/relevant tools for merging datasets with all current attributes/files?

Otherwise, we've suggested the use of a separate permalink URL for future service.  For example Los Angeles Metro publishes their “future-service” feed to a different Git “branch” enabling a permalink download at [`https://gitlab.com/LACMTA/gtfs_bus/-/blob/future-service/gtfs_bus.zip`](https://gitlab.com/LACMTA/gtfs_bus/-/blob/future-service/gtfs_bus.zip)

---

Thoughts here? 

 @gcamp @Cristhian-HA @scmcca @westontrillium and others!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposed Best Practice: Update practice for publishing future service #48

Current Relevant Best Practice

Needs

Solutions Considered

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposed Best Practice: Update practice for publishing future service #48

Description

Current Relevant Best Practice

Needs

Solutions Considered

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions