Skip to content

Add trip handling strategies for MTC feed merge#376

Merged
binh-dam-ibigroup merged 75 commits intodevfrom
mtc-merge-2021
Jan 14, 2022
Merged

Add trip handling strategies for MTC feed merge#376
binh-dam-ibigroup merged 75 commits intodevfrom
mtc-merge-2021

Conversation

@landonreed
Copy link
Copy Markdown
Contributor

@landonreed landonreed commented Apr 29, 2021

Checklist

  • Appropriate branch selected (all PRs must first be merged to dev before they can be merged to master)
  • Any modified or new methods or classes have helpful JavaDoc and code is thoroughly commented
  • The description lists all applicable issues this PR seeks to resolve
  • The description lists any configuration setting(s) that differ from the default settings
  • All tests and CI builds passing

Description

This PR implements changes to MTC's service period merge feature as defined here: https://docs.google.com/document/d/1mx9s3XfWfPB7SSXDwCbnRF7F2zLDU2ED4gK-W6eXL9o/edit

This PR also fixes #244.

Notable changes include:

  1. Moving a bunch of pre-existing things (FeedToMerge class and util methods) to new files to clean things up.
  2. Implementing MergeStrategy enum to mimic strategies in google doc.
  3. Update to gtfs-lib in order to get the StopTime#hashCode method.
  4. Some more tests.
  5. Note: Works with Mtc merge 2021 tweaks datatools-ui#686

@landonreed landonreed changed the title fix(MergeFeedsJob): add trip handling strategies for MTC merges Add trip handling strategies for MTC feed merge Apr 29, 2021
/**
* If service_ids and trip_ids in active feed are the same as future feed then the service end date for the
* merged feed shall match with future feed’s service end date and the service start date for the merged feed
* should be the merged date. All files from the future feed only shall be used in the merged feed.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unclear from this comment what "merged date" is. If a reference to either the active or future feed's dates could be made, that would make it explicit.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like the "reconciled" start and end date of the entry in the calendar table, but I agree that rewording this term would help.

/**
* If service_ids in active and future feed exactly match but only some of the trip_ids match then the merge
* strategy shall handle the following three cases:
* - *trip_id in both feeds*: The service shall start from the data merge date and end at the future feed’s service
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you be more explicit on what the data merge date would be here?

Comment on lines +36 to +37
* - *trip_id in past feed*: A new service shall be created starting from the merge date and expiring at the end
* of active service period.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please clarify what "merge date" is and it'd be good to stick with the same terminology for past/active/future feed. Past is always active, right?

Comment on lines +37 to +46
CsvReader csvReader = table.getCsvReader(zipFile, null);
if (csvReader == null) {
LOG.warn("Table {} not found in zip file: {}", table.name, zipFile.getName());
return ids;
}
Field[] fieldsFoundInZip = table.getFieldsFromFieldHeaders(csvReader.getHeaders(), null);
// Get the key field (id value) for each row.
int keyFieldIndex = getFieldIndex(fieldsFoundInZip, keyField);
while (csvReader.readRecord()) ids.add(csvReader.get(keyFieldIndex));
csvReader.close();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please ensure this CsvReader is always closed. Maybe use a try-with-resources block.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think try with resources can be used because the class does not implement AutoCloseable. Are there scenarios you've seen where this is not getting closed?

Copy link
Copy Markdown
Contributor

@evansiroky evansiroky Jun 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's any exception that occurs before the reader is closed, then the reader will be left open. You're right about CsvReader not implementing Closeable, so better wrap all this with try-finally.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this still needs to be implemented.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled in 06b2156.

);
}

// /**
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment for why there is a big comment block here?

Comment on lines +1102 to +1103
pastServiceIds.addAll(currentFeed.idsForTable.get(Table.CALENDAR));
pastServiceIds.addAll(currentFeed.idsForTable.get(Table.CALENDAR_DATES));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it past or current. It'd be really helpful to pick one thing to call this thing to avoid confusion.

Copy link
Copy Markdown
Contributor

@binh-dam-ibigroup binh-dam-ibigroup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the formatting and refactoring suggestions for now.

/**
* If service_ids and trip_ids in active feed are the same as future feed then the service end date for the
* merged feed shall match with future feed’s service end date and the service start date for the merged feed
* should be the merged date. All files from the future feed only shall be used in the merged feed.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like the "reconciled" start and end date of the entry in the calendar table, but I agree that rewording this term would help.

Field[] fieldsFoundInZip = table.getFieldsFromFieldHeaders(csvReader.getHeaders(), null);
// Get the key field (id value) for each row.
int keyFieldIndex = getFieldIndex(fieldsFoundInZip, keyField);
while (csvReader.readRecord()) ids.add(csvReader.get(keyFieldIndex));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put the body of the while loop on a separate line and in brackets, please.

Copy link
Copy Markdown
Contributor

@evansiroky evansiroky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This MergeFeedsJob is becoming really complex to the point where it's kind of unclear what all is going on without doing a deep dive into it. To help with this, each step should be thoroughly commented and also, I think it'd really help to have consistent terminology about what past/current/future feeds are.

Copy link
Copy Markdown
Contributor

@binh-dam-ibigroup binh-dam-ibigroup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving following client approval.

@binh-dam-ibigroup binh-dam-ibigroup merged commit eaa68c6 into dev Jan 14, 2022
@binh-dam-ibigroup binh-dam-ibigroup deleted the mtc-merge-2021 branch January 14, 2022 14:28
@philip-cline philip-cline mentioned this pull request Feb 8, 2022
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MTC feed merge: feed-scope service_ids duplicated in both calendar_dates tables

6 participants