Skip to content

Commit 06b2156

Browse files
refactor: Address PR comments.
1 parent d6e63dc commit 06b2156

File tree

5 files changed

+40
-102
lines changed

5 files changed

+40
-102
lines changed

src/main/java/com/conveyal/datatools/manager/jobs/MergeFeedsJob.java

Lines changed: 5 additions & 76 deletions
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,8 @@
5555
* found in any other feed version. Note: There is absolutely no attempt to merge
5656
* entities based on either expected shared IDs or entity location (e.g., stop
5757
* coordinates).
58-
* - {@link MergeFeedsType#SERVICE_PERIOD}: this strategy is defined in detail at https://github.com/conveyal/datatools-server/issues/185,
58+
* - {@link MergeFeedsType#SERVICE_PERIOD}:
59+
* this strategy is defined in detail at https://github.com/conveyal/datatools-server/issues/185,
5960
* but in essence, this strategy attempts to merge an active and future feed into
6061
* a combined file. For certain entities (specifically stops and routes) it uses
6162
* alternate fields as primary keys (stop_code and route_short_name) if they are
@@ -66,55 +67,6 @@
6667
* prefer entities from the active version, so that entities edited in Data Tools would override the values found
6768
* in the "future" file, which may have limited data attributes due to being exported from scheduling software with
6869
* limited GTFS support.
69-
*
70-
* Reproduced from https://github.com/conveyal/datatools-server/issues/185 on 2019/04/23:
71-
*
72-
* 1. When a new GTFS+ feed is loaded in TDM, check as part of the loading and validation process if
73-
* the dataset is for a future date. (If all services start in the future, consider the dataset
74-
* to be for the future).
75-
* 2. If it is a future dataset, automatically notify the user that the feed needs to be merged with
76-
* most recent active version or a selected one in order to further process the feed.
77-
* 3. Use the chosen version to merge the future feed. The merging process needs to be efficient so
78-
* that the user doesn’t need to wait more than a tolerable time.
79-
* 4. The merge process shall compare the active and future datasets, validate the following rules
80-
* and generate the Merge Validation Report:
81-
* i. Merging will be based on route_short_name in the active and future datasets. All matching
82-
* route_short_names between the datasets shall be considered same route. Any route_short_name
83-
* in active data not present in the future will be appended to the future routes file.
84-
* ii. Future feed_info.txt file should get priority over active feed file when difference is
85-
* identified.
86-
* iii. When difference is found in agency.txt file between active and future feeds, the future
87-
* agency.txt file data should be used. Possible issue with missing agency_id referenced by routes
88-
* iv. When stop_code is included, stop merging will be based on that. If stop_code is not
89-
* included, it will be based on stop_id. All stops in future data will be carried forward and
90-
* any stops found in active data that are not in the future data shall be appended. If one
91-
* of the feed is missing stop_code, merge fails with a notification to the user with
92-
* suggestion that the feed with missing stop_code must be fixed with stop_code.
93-
* v. If any service_id in the active feed matches with the future feed, it should be modified
94-
* and all associated trip records must also be changed with the modified service_id.
95-
* If a service_id from the active calendar has both the start_date and end_date in the
96-
* future, the service shall not be appended to the merged file. Records in trips,
97-
* calendar_dates, and calendar_attributes referencing this service_id shall also be
98-
* removed/ignored. Stop_time records for the ignored trips shall also be removed.
99-
* If a service_id from the active calendar has only the end_date in the future, the end_date
100-
* shall be set to one day prior to the earliest start_date in future dataset before appending
101-
* the calendar record to the merged file.
102-
* trip_ids between active and future datasets must not match. If any trip_id is found to be
103-
* matching, the merge should fail with appropriate notification to user with the cause of the
104-
* failure. Notification should include all matched trip_ids.
105-
* vi. New shape_ids in the future datasets should be appended in the merged feed.
106-
* vii. Merging fare_attributes will be based on fare_id in the active and future datasets. All
107-
* matching fare_ids between the datasets shall be considered same fare. Any fare_id in active
108-
* data not present in the future will be appended to the future fare_attributes file.
109-
* viii. All fare rules from the future dataset will be included. Any identical fare rules from
110-
* the active dataset will be discarded. Any fare rules unique to the active dataset will be
111-
* appended to the future file.
112-
* ix. All transfers.txt entries with unique stop pairs (from - to) from both the future and
113-
* active datasets will be included in the merged file. Entries with duplicate stop pairs from
114-
* the active dataset will be discarded.
115-
* x. All GTFS+ files should be merged based on how the associated base GTFS file is merged. For
116-
* example, directions for routes that are not in the future routes.txt file should be appended
117-
* to the future directions.txt file in the merged feed.
11870
*/
11971
public class MergeFeedsJob extends FeedSourceJob {
12072

@@ -143,31 +95,13 @@ public class MergeFeedsJob extends FeedSourceJob {
14395
// Variables used for a service period merge.
14496
private FeedMergeContext feedMergeContext;
14597

146-
public MergeFeedsJob(Auth0UserProfile owner, Set<FeedVersion> feedVersions, String file, MergeFeedsType mergeType) {
147-
this(owner, feedVersions, file, mergeType, true);
148-
}
149-
150-
/** Shorthand method to get the future feed during a service period merge */
151-
@BsonIgnore @JsonIgnore
152-
public FeedToMerge getFutureFeed() {
153-
return feedMergeContext.futureFeedToMerge;
154-
}
155-
156-
/** Shorthand method to get the active feed during a service period merge */
157-
@BsonIgnore @JsonIgnore
158-
public FeedToMerge getActiveFeed() {
159-
return feedMergeContext.activeFeedToMerge;
160-
}
161-
16298
/**
16399
* @param owner user ID that initiated job
164100
* @param feedVersions set of feed versions to merge
165101
* @param file resulting merge filename (without .zip)
166102
* @param mergeType the type of merge to perform {@link MergeFeedsType}
167-
* @param storeNewVersion whether to store merged feed as new version
168103
*/
169-
public MergeFeedsJob(Auth0UserProfile owner, Set<FeedVersion> feedVersions, String file,
170-
MergeFeedsType mergeType, boolean storeNewVersion) {
104+
public MergeFeedsJob(Auth0UserProfile owner, Set<FeedVersion> feedVersions, String file, MergeFeedsType mergeType) {
171105
super(owner, mergeType.equals(REGIONAL) ? "Merging project feeds" : "Merging feed versions",
172106
JobType.MERGE_FEED_VERSIONS);
173107
this.feedVersions = feedVersions;
@@ -182,7 +116,7 @@ public MergeFeedsJob(Auth0UserProfile owner, Set<FeedVersion> feedVersions, Stri
182116
// Grab parent feed source depending on merge type.
183117
FeedSource regionalFeedSource = null;
184118
// If storing a regional merge as a new version, find the feed source designated by the project.
185-
if (mergeType.equals(REGIONAL) && storeNewVersion) {
119+
if (mergeType.equals(REGIONAL)) {
186120
regionalFeedSource = Persistence.feedSources.getById(project.regionalFeedSourceId);
187121
// Create new feed source if this is the first regional merge.
188122
if (regionalFeedSource == null) {
@@ -201,7 +135,7 @@ public MergeFeedsJob(Auth0UserProfile owner, Set<FeedVersion> feedVersions, Stri
201135
: feedVersions.iterator().next().parentFeedSource();
202136
// Assuming job is successful, mergedVersion will contain the resulting feed version.
203137
// Merged version will be null if the new version should not be stored.
204-
this.mergedVersion = getMergedVersion(this, storeNewVersion);
138+
this.mergedVersion = getMergedVersion(this, true);
205139
this.mergeFeedsResult = new MergeFeedsResult(mergeType);
206140
}
207141

@@ -210,11 +144,6 @@ public Set<FeedVersion> getFeedVersions() {
210144
return this.feedVersions;
211145
}
212146

213-
@BsonIgnore @JsonIgnore
214-
public List<FeedToMerge> getFeedsToMerge() {
215-
return this.feedMergeContext.feedsToMerge;
216-
}
217-
218147
/**
219148
* The final stage handles clean up (deleting temp file) and adding the next job to process the
220149
* new merged version (assuming the merge did not fail).

src/main/java/com/conveyal/datatools/manager/jobs/feedmerge/MergeFeedsResult.java

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,6 @@ public class MergeFeedsResult implements Serializable {
1818
/** Type of merge operation performed */
1919
public MergeFeedsType type;
2020
public MergeStrategy mergeStrategy = MergeStrategy.DEFAULT;
21-
/** Contains a set of strings for which there were error-causing duplicate values */
22-
public Set<String> idConflicts = new HashSet<>();
2321
/** Contains the set of IDs for records that were excluded in the merged feed */
2422
public Set<String> skippedIds = new HashSet<>();
2523
/**

src/main/java/com/conveyal/datatools/manager/jobs/feedmerge/MergeLineContext.java

Lines changed: 12 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,6 @@
4141
public class MergeLineContext {
4242
protected static final String AGENCY_ID = "agency_id";
4343
protected static final String SERVICE_ID = "service_id";
44-
private static final String STOPS = "stops";
4544
private static final Logger LOG = LoggerFactory.getLogger(MergeLineContext.class);
4645
protected final MergeFeedsJob job;
4746
private final ZipOutputStream out;
@@ -91,7 +90,7 @@ public static MergeLineContext create(MergeFeedsJob job, Table table, ZipOutputS
9190
return new RoutesMergeLineContext(job, table, out);
9291
case "shapes":
9392
return new ShapesMergeLineContext(job, table, out);
94-
case STOPS:
93+
case "stops":
9594
return new StopsMergeLineContext(job, table, out);
9695
case "trips":
9796
return new TripsMergeLineContext(job, table, out);
@@ -268,21 +267,17 @@ public void checkFieldsForMergeConflicts(Set<NewGTFSError> idErrors) throws IOEx
268267
}
269268

270269
private Set<NewGTFSError> getIdErrors() {
271-
Set<NewGTFSError> idErrors;
272-
Field field = fieldContext.getField();
270+
String fieldValue;
273271
// If analyzing the second feed (active feed), the service_id always gets feed scoped.
274272
// See https://github.com/ibi-group/datatools-server/issues/244
275273
if (handlingActiveFeed && fieldNameEquals(SERVICE_ID)) {
276274
updateAndRemapOutput();
277-
idErrors = referenceTracker
278-
.checkReferencesAndUniqueness(keyValue, lineNumber, field, fieldContext.getValueToWrite(),
279-
table, keyField, orderField);
275+
fieldValue = fieldContext.getValueToWrite();
280276
} else {
281-
idErrors = referenceTracker
282-
.checkReferencesAndUniqueness(keyValue, lineNumber, field, fieldContext.getValue(),
283-
table, keyField, orderField);
277+
fieldValue = fieldContext.getValue();
284278
}
285-
return idErrors;
279+
return referenceTracker.checkReferencesAndUniqueness(keyValue, lineNumber, fieldContext.getField(),
280+
fieldValue, table, keyField, orderField);
286281
}
287282

288283
protected void checkRoutesAndStopsIds(Set<NewGTFSError> idErrors) throws IOException {
@@ -345,7 +340,11 @@ protected void checkRoutesAndStopsIds(Set<NewGTFSError> idErrors) throws IOExcep
345340
// where two routes have different short_names, but share the same route_id. We want
346341
// both of these routes to end up in the merged feed in this case because we're
347342
// matching on short name, so we must modify the route_id.
348-
if (!skipRecord && !referenceTracker.transitIds.contains(String.join(":", keyField, keyValue)) && hasDuplicateError(primaryKeyErrors)) {
343+
if (
344+
!skipRecord &&
345+
!referenceTracker.transitIds.contains(String.join(":", keyField, keyValue)) &&
346+
hasDuplicateError(primaryKeyErrors)
347+
) {
349348
// Modify route_id and ensure that referencing trips
350349
// have route_id updated.
351350
updateAndRemapOutput();
@@ -398,7 +397,7 @@ public boolean storeRowAndStopValues() {
398397
// Store row values for route or stop ID (or alternative ID field) in order
399398
// to check for ID conflicts. NOTE: This is only intended to be used for
400399
// routes and stops. Otherwise, this might (will) consume too much memory.
401-
case STOPS:
400+
case "stops":
402401
case "routes":
403402
// FIXME: This should be revised for tables with order fields, but it should work fine for its
404403
// primary purposes: to detect exact copy rows and to temporarily hold the data in case a reference

src/main/java/com/conveyal/datatools/manager/jobs/feedmerge/StopsMergeLineContext.java

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,15 +25,19 @@ public StopsMergeLineContext(MergeFeedsJob job, Table table, ZipOutputStream out
2525

2626
@Override
2727
public void checkFirstLineConditions() throws IOException {
28-
checkStopCodeStuff();
28+
checkThatStopCodesArePopulatedWhereRequired();
2929
}
3030

3131
@Override
3232
public void checkFieldsForMergeConflicts(Set<NewGTFSError> idErrors) throws IOException {
3333
checkRoutesAndStopsIds(idErrors);
3434
}
3535

36-
private void checkStopCodeStuff() throws IOException {
36+
/**
37+
* Checks that the stop_code field of the Stop entities to merge is populated where required.
38+
* @throws IOException
39+
*/
40+
private void checkThatStopCodesArePopulatedWhereRequired() throws IOException {
3741
if (shouldCheckStopCodes()) {
3842
// Before reading any lines in stops.txt, first determine whether all records contain
3943
// properly filled stop_codes. The rules governing this logic are as follows:

src/main/java/com/conveyal/datatools/manager/utils/MergeFeedUtils.java

Lines changed: 17 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -50,11 +50,16 @@ public static Set<String> getIdsForTable(ZipFile zipFile, Table table) throws IO
5050
LOG.warn("Table {} not found in zip file: {}", table.name, zipFile.getName());
5151
return ids;
5252
}
53-
Field[] fieldsFoundInZip = table.getFieldsFromFieldHeaders(csvReader.getHeaders(), null);
54-
// Get the key field (id value) for each row.
55-
int keyFieldIndex = getFieldIndex(fieldsFoundInZip, keyField);
56-
while (csvReader.readRecord()) ids.add(csvReader.get(keyFieldIndex));
57-
csvReader.close();
53+
try {
54+
Field[] fieldsFoundInZip = table.getFieldsFromFieldHeaders(csvReader.getHeaders(), null);
55+
// Get the key field (id value) for each row.
56+
int keyFieldIndex = getFieldIndex(fieldsFoundInZip, keyField);
57+
while (csvReader.readRecord()) {
58+
ids.add(csvReader.get(keyFieldIndex));
59+
}
60+
} finally {
61+
csvReader.close();
62+
}
5863
return ids;
5964
}
6065

@@ -117,10 +122,13 @@ public static Set<Field> getAllFields(List<FeedToMerge> feedsToMerge, Table tabl
117122
if (csvReader == null) {
118123
continue;
119124
}
120-
// Get fields found from headers and add them to the shared fields set.
121-
Field[] fieldsFoundInZip = table.getFieldsFromFieldHeaders(csvReader.getHeaders(), null);
122-
sharedFields.addAll(Arrays.asList(fieldsFoundInZip));
123-
csvReader.close();
125+
try {
126+
// Get fields found from headers and add them to the shared fields set.
127+
Field[] fieldsFoundInZip = table.getFieldsFromFieldHeaders(csvReader.getHeaders(), null);
128+
sharedFields.addAll(Arrays.asList(fieldsFoundInZip));
129+
} finally {
130+
csvReader.close();
131+
}
124132
}
125133
return sharedFields;
126134
}

0 commit comments

Comments
 (0)