Conversation
|
With regard to: "Time - The field contains the time in the HH:MM:SS format (H:MM:SS is also accepted)" would have preferred if the optional H:MM:SS was removed. After loading data into a DB I look for anything like this and prepend it with a 0. Doing this allows the times to be compared as strings which means a DB clause such as departure_time>"07:35:00" works as expected. |
|
@harringtonp while I am not against what you suggest, but you can never expect a HH:MM:SS timestamp anyway. Because the time might even go up to HHH:MM:SS. Which is not supported by Google Transit, but certainly not against the spirit of the standard. The take away: HH is not 00-23, |
|
@LeoFrachet tightening definitions would be breaking backward compatibility, which has generally been avoided in the past. It might be nearly impossible to track down and force all data producers to change their previously-conforming output - my sense is that many of them cannot or will not do so because they paid an external contractor or vendor to make a one-off export tool. I believe things like phone numbers were intended for literal display to human users, so is there a need to specify an exact format? |
|
@harringtonp similar to my comment to @LeoFrachet, requiring two or three digits for the hours field would cause existing feeds to be invalid, and this type of change has been avoided in the past. Another point of view: arguably it is not optimal to store and sort times as strings - if these are treated as numbers (we typically convert to seconds after midnight) this eliminates the perceived problem. In any case, as @LeoFrachet said, factoring out definitions should be separate from changing syntax. If you have proposals for changing how times are expressed, those should be separate issues. |
@abyrd: I fully agree, and this is why this proposal (almost, see below) doesn't contains change in the definition of the types, I'm just grouping them on top of the spec to avoid duplication, and I'm adding examples to help comprehension. The only two "changes" that I've made are:
I can remove those two "changes" if they are controversial. |
slai
left a comment
There was a problem hiding this comment.
Looks good. Makes it easier to maintain, although at the slight cost of having to jump around the docs if you're unfamiliar with the types. I think consistent definitions of types is worth it though.
gtfs/spec/en/reference.md
Outdated
| - **ID** - The field contains an identifier which uniquely identifies an entity (therefore is dataset unique). Its value is not aimed to be displayed to the user, and is a sequence of any UTF-8 characters, but using only printable ASCII characters is recommended. | ||
| - **Language Code** - The field contains a IETF BCP 47 language code. For an introduction to IETF BCP 47, please refer to http://www.rfc-editor.org/rfc/bcp/bcp47.txt and http://www.w3.org/International/articles/language-tags/. For example: "en" for English, "en-US" for American English or "de" for German. | ||
| - **Latitude** - The field contains the WGS84 latitude in decimal degrees. The value must be greater than or equal to -90.0 and less than or equal to 90.0. For example: "41.890169" for the Colloseum in Rome. | ||
| - **Longitude** - The field contains the WGS84 longitude in decimal degrees. The value must be geater than or equal to -180.0 and less than or equal to 180.0. For example: "12.492269" for the Colloseum in Rome. |
There was a problem hiding this comment.
typo - greater
Also not sure if Colloseum is spelt correctly - Google disagrees https://www.google.com/search?q=colloseum&ie=utf-8&oe=utf-8&client=firefox-b-ab
There was a problem hiding this comment.
Thanks for both. Fixed.
|
About needing to jump around in the docs - any chance we can make links back to the definitions or somehow get them to show up in tooltips? Probably asking too much from markdown. In response to my own previous observation:
Looking at this again I realize few people would actually type a phone number, they'd probably tap it to auto-dial on their phone, so maybe format is important. But I think it should be a "strong recommendation" rather than a breaking spec change. |
|
I completely agree with the two changes you listed @LeoFrachet - the latitude one is clearly just a mistake, and it would be great to have full voluntary adoption of IDs being plain ASCII (excluding commas and whitespace). It would be interesting to do a scan of all existing known data to see if there are exceptions to that, and if not it might even be worth a breaking, more restrictive definition. |
Phone number formattingRegarding phone number formatting, my current proposal do not change the spec. I keep the current definition and I just move it. If we want to change it, I'm expecting a whole conversation will be needed, and I would rather do that separate. Jump around in the specI can add links to the type definition. Aka you could clic on "Date" and it would move up to the date. But IMHO the best case would be that we keep the raw spec on GitHub, and we have a nice and dynamic website showing the specification in a better way, e.g. gtfs.org. On this more complex page, we could for example have the possibility to "expend" the type definition when we want to see more, but it would be hidden by default to avoid cluttering it. But yeah I can add links to the type definition. IDs tighteningI fully agree that spaces and commas are also annoying, and I would be open to forbid them. We can ask big consumers to do a survey. Maybe I could just remove the sentence from this proposal and open another thread to really make it stricter. |
My preference is to keep the current text in this proposal for IDs, unless someone has a strong preference to remove it. It's only "recommended", so it's not breaking anything. |
|
Nothing to add at our side, we are totally inline with the proposal and the comments above. |
|
Since the From:
To:
|
|
This pull request has been open more than one week ago. As per the Official Process, we can now open the vote on it. Vote will be closed on Monday October 1st at 23:59:59 UTC. |
|
+1 |
3 similar comments
|
+1 |
|
+1 |
|
+1 |
|
+1 thanks for the effort |
|
+1 |
|
I am not sure how to vote? Is writing "+1" here, all I have to do to say I agree with this proposal ? |
|
+1 |
Here @mbta, we currently have a handful of IDs with spaces, but none with commas. It's possible that we'll be changing them in the future, but since we use them in many different places it's unlikely to happen soon. One other note about the documentation for IDs: we use them in URLs, so we consider them partially user-visible. |
|
@ECF-dot-com: Yes, you can just "+1" to give your approval. @paulswartz: Good to know about spaces in MBTA's ids. The current proposal under vote do not speak about spaces in ID. It just "strongly suggest" to avoid non ASCII character in it (e.g. ü, ß or ç). |
|
Just for the record, we have data with ids that contains "é". |
|
7 votes in favor (even 8 if we include the one from @ECF-dot-com). => The proposal is adopted. |
| | | | | • If this stop ID represents **a stop located inside a station**, this entry's location type must be **0 or blank**, and this entry's parent_station field contains **the stop ID of the station where this stop is located and the stop referenced by parent_station must have location_type=1.** | | ||
| | | | | • If this stop ID represents **a stop located outside a station**, this entry's location type must be **0 or blank**, and this entry's parent_station field contains **a blank value (the parent_station field doesn't apply to this stop).** | | ||
| | | | | • If this stop ID represents **a station**, this entry's location type must be **1**, and this entry's parent_station field contains **a blank value (stations can't contain other stations).** | | ||
| | stop_timezone | Timezone | Optional | The **stop_timezone** field contains the timezone in which this stop, station, or station entrance is located. If omitted, the stop should be assumed to be located in the timezone specified by **agency_timezone** in [agency.txt](#agencytxt). When a stop has a parent station, the stop is considered to be in the timezone specified by the parent station's **stop_timezone** value. If the parent has no stop_timezone value, the stops that belong to that station are assumed to be in the timezone specified by **agency_timezone**, even if the stops have their own **stop_timezone** values. In other words, if a given stop has a **parent_station** value, any **stop_timezone** value specified for that stop must be ignored. Even if **stop_timezone** values are provided in stops.txt, the times in [stop_times.txt](#stop_timestxt) should continue to be specified as time since midnight in the timezone specified by **agency_timezone** in agency.txt. This ensures that the time values in a trip always increase over the course of a trip, regardless of which timezones the trip crosses. | |
There was a problem hiding this comment.
git blame lead me here, I have a question.
The stop_timezone definition says that
[...] if
stop_timezonevalues are provided instops.txt, the times instop_times.txtshould continue to be specified as time since midnight in the timezone specified byagency_timezoneinagency.txt.
But the newly added GTFS Time definition says:
The time is measured from "noon minus 12h" of the service day (effectively midnight except for days on which daylight savings time changes occur).
I assume that the latter is the correct definition, and that the explanation for stop_timezone should be adapted?
Hi GTFS community!
We are starting to have some heavy duplication in the specification. The definition of an URL is given five times (
stop_url,route_url,feed_publisher_url,agency_url,agency_fare_url), the definition of a timezone is given twice, just like the definition of a language code or a service day or a color.This pull request offers to factorize those definitions of types on the top of the specification, to avoid duplication and make it more concise. It also makes it easier for consumers who wants to parse a GTFS and who wants to see which data is in which field.
Last but not least, it highlights the specific definition of "date" and "time" in a GTFS dataset, which isn't the usual definition of date and time.
You can read the pull request using the "Files changed" tab, or you can read the result formatted markdown, which might be easier:
https://github.com/MobilityData/transit/blob/field_types_2/gtfs/spec/en/reference.md#field-types
Thoughts?
Side note: I would have loved to tighten some definitions, e.g. to force the phone numbers to be in the E123 format, but I didn't do it because I wanted to keep this proposal as being just a factorization of definitions and not a change in the specification per se.