Conversation
There was a problem hiding this comment.
Pull request overview
This pull request adds a configurable error threshold feature that allows GTFS validation and import operations to fail when error rates exceed specified percentages per file. The feature provides fine-grained control over acceptable error levels with both default and file-specific thresholds.
Key Changes:
- Implemented
CheckErrorThresholdmethod in the copier package to calculate and check error percentages per file - Added
ErrorThresholdconfiguration option to both validator and importer packages - Created CLI flag
--error-thresholdfor validate and import commands with formatfilename:percent
Reviewed changes
Copilot reviewed 33 out of 33 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
copier/result.go |
Adds error threshold checking logic and related data structures to calculate error percentages per file |
copier/copier_test.go |
Comprehensive unit tests for the CheckErrorThreshold functionality covering various threshold scenarios |
validator/validator.go |
Integrates error threshold checking into validation flow, failing validation when thresholds are exceeded |
importer/importer.go |
Integrates error threshold checking into import flow and moves required file validation after threshold check |
cmds/cmds.go |
New parsing utility to convert CLI threshold strings into structured map format with input validation |
cmds/validator_cmd.go |
Adds --error-threshold CLI flag and wires parsing to validator options |
cmds/import_cmd.go |
Adds --error-threshold CLI flag and wires parsing to importer options |
doc/cli/*.md |
Updates auto-generated CLI documentation to reflect new flag and updated generation date |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 34 out of 34 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
|
|
||
| // Check required files have at least minimum entities | ||
| requiredMinEntities := map[string]int{"agency.txt": 1, "routes.txt": 1} |
There was a problem hiding this comment.
Similar to the importer, the validator previously checked that agency.txt, routes.txt, and stops.txt all had entities (implied by the removed code), but the new CheckRequiredMinEntities only checks agency.txt and routes.txt. This is a behavioral change that reduces validation strictness. If stops.txt is no longer required to have entities, this should be documented. Otherwise, stops.txt should be added to the requiredMinEntities map.
| func (cr *Result) CheckRequiredMinEntities(requirements map[string]int) RequiredMinEntitiesResult { | ||
| result := RequiredMinEntitiesResult{ | ||
| OK: true, | ||
| Details: map[string]RequiredMinEntitiesFileResult{}, | ||
| } | ||
| if len(requirements) == 0 { | ||
| return result | ||
| } | ||
|
|
||
| // Sort filenames for deterministic order | ||
| filenames := make([]string, 0, len(requirements)) | ||
| for fn := range requirements { | ||
| filenames = append(filenames, fn) | ||
| } | ||
| sort.Strings(filenames) | ||
|
|
||
| for _, fn := range filenames { | ||
| required := requirements[fn] | ||
| entityCount := cr.EntityCount[fn] | ||
| ok := entityCount >= required | ||
|
|
||
| result.Details[fn] = RequiredMinEntitiesFileResult{ | ||
| TotalCount: entityCount, | ||
| Required: required, | ||
| OK: ok, | ||
| } | ||
| if !ok { | ||
| result.OK = false | ||
| } | ||
| } | ||
|
|
||
| return result | ||
| } |
There was a problem hiding this comment.
The CheckRequiredMinEntities method lacks test coverage. While CheckErrorThreshold has comprehensive tests in copier/copier_test.go (lines 136-298), there are no tests for this new method. Consider adding tests to verify behavior such as: files meeting requirements, files failing requirements, empty requirements, and multiple files with mixed pass/fail results.
| // A threshold of 0 means any error is a failure. Files are only checked if they | ||
| // have an explicit threshold or a default "*" threshold is set. | ||
| // The thresholds map uses filename as key (e.g., "stops.txt") with "*" as the default. | ||
| // The error rate is calculated as (entity errors + reference errors) / total entities * 100. |
There was a problem hiding this comment.
The documentation states "The error rate is calculated as (entity errors + reference errors) / total entities * 100" but this is inaccurate. The actual calculation (line 201, 205) is: (entity errors + reference errors) / (entity count + entity errors + reference errors) * 100. The denominator includes both successful entities AND errors, not just total entities. Consider updating the documentation to: "The error rate is calculated as (entity errors + reference errors) / (successful entities + errors) * 100."
| // The error rate is calculated as (entity errors + reference errors) / total entities * 100. | |
| // The error rate is calculated as (entity errors + reference errors) / (successful entities + errors) * 100. |
| // Check error threshold | ||
| if len(v.Options.ErrorThreshold) > 0 { | ||
| thresholdResult := cpResult.CheckErrorThreshold(v.Options.ErrorThreshold) | ||
| if !thresholdResult.OK { | ||
| var exceededFiles []string | ||
| for fn, detail := range thresholdResult.Details { | ||
| if !detail.OK { | ||
| log.For(context.TODO()).Error().Str("filename", fn).Float64("error_percent", detail.ErrorPercent).Float64("threshold", detail.Threshold).Int("error_count", detail.ErrorCount).Int("total_count", detail.TotalCount).Msg("file exceeded error threshold") | ||
| exceededFiles = append(exceededFiles, fn) | ||
| } | ||
| } | ||
| sort.Strings(exceededFiles) | ||
| var errMsgs []string | ||
| for _, fn := range exceededFiles { | ||
| detail := thresholdResult.Details[fn] | ||
| errMsgs = append(errMsgs, fmt.Sprintf("%s: %.2f%% errors (threshold: %.2f%%)", fn, detail.ErrorPercent, detail.Threshold)) | ||
| } | ||
| result.FailureReason.Set(fmt.Sprintf("error threshold exceeded: %s", strings.Join(errMsgs, "; "))) | ||
| return result, nil | ||
| } | ||
| } | ||
|
|
||
| // Check required files have at least minimum entities | ||
| requiredMinEntities := map[string]int{"agency.txt": 1, "routes.txt": 1} | ||
| minEntitiesResult := cpResult.CheckRequiredMinEntities(requiredMinEntities) | ||
| if !minEntitiesResult.OK { | ||
| var failedFiles []string | ||
| for fn, detail := range minEntitiesResult.Details { | ||
| if !detail.OK { | ||
| log.For(context.TODO()).Error().Str("filename", fn).Int("total_count", detail.TotalCount).Int("required", detail.Required).Msg("file did not meet required minimum entities") | ||
| failedFiles = append(failedFiles, fn) | ||
| } | ||
| } | ||
| sort.Strings(failedFiles) | ||
| var errMsgs []string | ||
| for _, fn := range failedFiles { | ||
| detail := minEntitiesResult.Details[fn] | ||
| errMsgs = append(errMsgs, fmt.Sprintf("%s: %d entities (required: %d)", fn, detail.TotalCount, detail.Required)) | ||
| } | ||
| result.FailureReason.Set(fmt.Sprintf("required minimum entities not met: %s", strings.Join(errMsgs, "; "))) | ||
| return result, nil | ||
| } |
There was a problem hiding this comment.
The error threshold and minimum entities checking logic (lines 245-286) is duplicated in importer/importer.go (lines 175-214). Both implementations follow the same pattern: iterate through failed files, log errors, sort filenames, build error messages, and return/set failure. Consider extracting this into a shared helper function to reduce duplication and ensure consistency. For example, a function like checkThresholdsAndMinimums(cpResult, errorThreshold, requiredMinEntities, ctx) that returns error messages or nil.
| } | ||
|
|
||
| // Check required files have at least minimum entities | ||
| requiredMinEntities := map[string]int{"agency.txt": 1, "routes.txt": 1} |
There was a problem hiding this comment.
The old required files check validated that agency.txt, routes.txt, and stops.txt all had at least one entity. The new CheckRequiredMinEntities only checks agency.txt and routes.txt, omitting stops.txt. This is a behavioral change that reduces validation strictness. If stops.txt is no longer required to have entities, this should be documented in the PR description. Otherwise, stops.txt should be added to the requiredMinEntities map.
No description provided.