Skip to content

Configurable error limit#531

Merged
irees merged 4 commits intomainfrom
configurable-error-limit
Dec 5, 2025
Merged

Configurable error limit#531
irees merged 4 commits intomainfrom
configurable-error-limit

Conversation

@irees
Copy link
Copy Markdown
Contributor

@irees irees commented Dec 5, 2025

No description provided.

Copilot AI review requested due to automatic review settings December 5, 2025 01:50
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds a configurable error threshold feature that allows GTFS validation and import operations to fail when error rates exceed specified percentages per file. The feature provides fine-grained control over acceptable error levels with both default and file-specific thresholds.

Key Changes:

  • Implemented CheckErrorThreshold method in the copier package to calculate and check error percentages per file
  • Added ErrorThreshold configuration option to both validator and importer packages
  • Created CLI flag --error-threshold for validate and import commands with format filename:percent

Reviewed changes

Copilot reviewed 33 out of 33 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
copier/result.go Adds error threshold checking logic and related data structures to calculate error percentages per file
copier/copier_test.go Comprehensive unit tests for the CheckErrorThreshold functionality covering various threshold scenarios
validator/validator.go Integrates error threshold checking into validation flow, failing validation when thresholds are exceeded
importer/importer.go Integrates error threshold checking into import flow and moves required file validation after threshold check
cmds/cmds.go New parsing utility to convert CLI threshold strings into structured map format with input validation
cmds/validator_cmd.go Adds --error-threshold CLI flag and wires parsing to validator options
cmds/import_cmd.go Adds --error-threshold CLI flag and wires parsing to importer options
doc/cli/*.md Updates auto-generated CLI documentation to reflect new flag and updated generation date

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@irees irees requested a review from Copilot December 5, 2025 02:22
@irees irees merged commit d92a7ea into main Dec 5, 2025
10 checks passed
@irees irees deleted the configurable-error-limit branch December 5, 2025 02:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 34 out of 34 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

}

// Check required files have at least minimum entities
requiredMinEntities := map[string]int{"agency.txt": 1, "routes.txt": 1}
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the importer, the validator previously checked that agency.txt, routes.txt, and stops.txt all had entities (implied by the removed code), but the new CheckRequiredMinEntities only checks agency.txt and routes.txt. This is a behavioral change that reduces validation strictness. If stops.txt is no longer required to have entities, this should be documented. Otherwise, stops.txt should be added to the requiredMinEntities map.

Copilot uses AI. Check for mistakes.
Comment on lines +247 to +279
func (cr *Result) CheckRequiredMinEntities(requirements map[string]int) RequiredMinEntitiesResult {
result := RequiredMinEntitiesResult{
OK: true,
Details: map[string]RequiredMinEntitiesFileResult{},
}
if len(requirements) == 0 {
return result
}

// Sort filenames for deterministic order
filenames := make([]string, 0, len(requirements))
for fn := range requirements {
filenames = append(filenames, fn)
}
sort.Strings(filenames)

for _, fn := range filenames {
required := requirements[fn]
entityCount := cr.EntityCount[fn]
ok := entityCount >= required

result.Details[fn] = RequiredMinEntitiesFileResult{
TotalCount: entityCount,
Required: required,
OK: ok,
}
if !ok {
result.OK = false
}
}

return result
}
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CheckRequiredMinEntities method lacks test coverage. While CheckErrorThreshold has comprehensive tests in copier/copier_test.go (lines 136-298), there are no tests for this new method. Consider adding tests to verify behavior such as: files meeting requirements, files failing requirements, empty requirements, and multiple files with mixed pass/fail results.

Copilot uses AI. Check for mistakes.
// A threshold of 0 means any error is a failure. Files are only checked if they
// have an explicit threshold or a default "*" threshold is set.
// The thresholds map uses filename as key (e.g., "stops.txt") with "*" as the default.
// The error rate is calculated as (entity errors + reference errors) / total entities * 100.
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation states "The error rate is calculated as (entity errors + reference errors) / total entities * 100" but this is inaccurate. The actual calculation (line 201, 205) is: (entity errors + reference errors) / (entity count + entity errors + reference errors) * 100. The denominator includes both successful entities AND errors, not just total entities. Consider updating the documentation to: "The error rate is calculated as (entity errors + reference errors) / (successful entities + errors) * 100."

Suggested change
// The error rate is calculated as (entity errors + reference errors) / total entities * 100.
// The error rate is calculated as (entity errors + reference errors) / (successful entities + errors) * 100.

Copilot uses AI. Check for mistakes.
Comment on lines +245 to +286
// Check error threshold
if len(v.Options.ErrorThreshold) > 0 {
thresholdResult := cpResult.CheckErrorThreshold(v.Options.ErrorThreshold)
if !thresholdResult.OK {
var exceededFiles []string
for fn, detail := range thresholdResult.Details {
if !detail.OK {
log.For(context.TODO()).Error().Str("filename", fn).Float64("error_percent", detail.ErrorPercent).Float64("threshold", detail.Threshold).Int("error_count", detail.ErrorCount).Int("total_count", detail.TotalCount).Msg("file exceeded error threshold")
exceededFiles = append(exceededFiles, fn)
}
}
sort.Strings(exceededFiles)
var errMsgs []string
for _, fn := range exceededFiles {
detail := thresholdResult.Details[fn]
errMsgs = append(errMsgs, fmt.Sprintf("%s: %.2f%% errors (threshold: %.2f%%)", fn, detail.ErrorPercent, detail.Threshold))
}
result.FailureReason.Set(fmt.Sprintf("error threshold exceeded: %s", strings.Join(errMsgs, "; ")))
return result, nil
}
}

// Check required files have at least minimum entities
requiredMinEntities := map[string]int{"agency.txt": 1, "routes.txt": 1}
minEntitiesResult := cpResult.CheckRequiredMinEntities(requiredMinEntities)
if !minEntitiesResult.OK {
var failedFiles []string
for fn, detail := range minEntitiesResult.Details {
if !detail.OK {
log.For(context.TODO()).Error().Str("filename", fn).Int("total_count", detail.TotalCount).Int("required", detail.Required).Msg("file did not meet required minimum entities")
failedFiles = append(failedFiles, fn)
}
}
sort.Strings(failedFiles)
var errMsgs []string
for _, fn := range failedFiles {
detail := minEntitiesResult.Details[fn]
errMsgs = append(errMsgs, fmt.Sprintf("%s: %d entities (required: %d)", fn, detail.TotalCount, detail.Required))
}
result.FailureReason.Set(fmt.Sprintf("required minimum entities not met: %s", strings.Join(errMsgs, "; ")))
return result, nil
}
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error threshold and minimum entities checking logic (lines 245-286) is duplicated in importer/importer.go (lines 175-214). Both implementations follow the same pattern: iterate through failed files, log errors, sort filenames, build error messages, and return/set failure. Consider extracting this into a shared helper function to reduce duplication and ensure consistency. For example, a function like checkThresholdsAndMinimums(cpResult, errorThreshold, requiredMinEntities, ctx) that returns error messages or nil.

Copilot uses AI. Check for mistakes.
}

// Check required files have at least minimum entities
requiredMinEntities := map[string]int{"agency.txt": 1, "routes.txt": 1}
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old required files check validated that agency.txt, routes.txt, and stops.txt all had at least one entity. The new CheckRequiredMinEntities only checks agency.txt and routes.txt, omitting stops.txt. This is a behavioral change that reduces validation strictness. If stops.txt is no longer required to have entities, this should be documented in the PR description. Otherwise, stops.txt should be added to the requiredMinEntities map.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants