Configurable error limit by irees · Pull Request #531 · interline-io/transitland-lib

irees · 2025-12-05T01:50:37Z

No description provided.

Copilot

Pull request overview

This pull request adds a configurable error threshold feature that allows GTFS validation and import operations to fail when error rates exceed specified percentages per file. The feature provides fine-grained control over acceptable error levels with both default and file-specific thresholds.

Key Changes:

Implemented CheckErrorThreshold method in the copier package to calculate and check error percentages per file
Added ErrorThreshold configuration option to both validator and importer packages
Created CLI flag --error-threshold for validate and import commands with format filename:percent

Reviewed changes

Copilot reviewed 33 out of 33 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`copier/result.go`	Adds error threshold checking logic and related data structures to calculate error percentages per file
`copier/copier_test.go`	Comprehensive unit tests for the `CheckErrorThreshold` functionality covering various threshold scenarios
`validator/validator.go`	Integrates error threshold checking into validation flow, failing validation when thresholds are exceeded
`importer/importer.go`	Integrates error threshold checking into import flow and moves required file validation after threshold check
`cmds/cmds.go`	New parsing utility to convert CLI threshold strings into structured map format with input validation
`cmds/validator_cmd.go`	Adds `--error-threshold` CLI flag and wires parsing to validator options
`cmds/import_cmd.go`	Adds `--error-threshold` CLI flag and wires parsing to importer options
`doc/cli/*.md`	Updates auto-generated CLI documentation to reflect new flag and updated generation date

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

cmds/cmds.go

validator/validator.go

importer/importer.go

Copilot

Pull request overview

Copilot reviewed 34 out of 34 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-05T02:27:16Z

validator/validator.go

+	}
+
+	// Check required files have at least minimum entities
+	requiredMinEntities := map[string]int{"agency.txt": 1, "routes.txt": 1}


Similar to the importer, the validator previously checked that agency.txt, routes.txt, and stops.txt all had entities (implied by the removed code), but the new CheckRequiredMinEntities only checks agency.txt and routes.txt. This is a behavioral change that reduces validation strictness. If stops.txt is no longer required to have entities, this should be documented. Otherwise, stops.txt should be added to the requiredMinEntities map.

Copilot · 2025-12-05T02:27:17Z

copier/result.go

+func (cr *Result) CheckRequiredMinEntities(requirements map[string]int) RequiredMinEntitiesResult {
+	result := RequiredMinEntitiesResult{
+		OK:      true,
+		Details: map[string]RequiredMinEntitiesFileResult{},
+	}
+	if len(requirements) == 0 {
+		return result
+	}
+
+	// Sort filenames for deterministic order
+	filenames := make([]string, 0, len(requirements))
+	for fn := range requirements {
+		filenames = append(filenames, fn)
+	}
+	sort.Strings(filenames)
+
+	for _, fn := range filenames {
+		required := requirements[fn]
+		entityCount := cr.EntityCount[fn]
+		ok := entityCount >= required
+
+		result.Details[fn] = RequiredMinEntitiesFileResult{
+			TotalCount: entityCount,
+			Required:   required,
+			OK:         ok,
+		}
+		if !ok {
+			result.OK = false
+		}
+	}
+
+	return result
+}


The CheckRequiredMinEntities method lacks test coverage. While CheckErrorThreshold has comprehensive tests in copier/copier_test.go (lines 136-298), there are no tests for this new method. Consider adding tests to verify behavior such as: files meeting requirements, files failing requirements, empty requirements, and multiple files with mixed pass/fail results.

Copilot · 2025-12-05T02:27:17Z

copier/result.go

+// A threshold of 0 means any error is a failure. Files are only checked if they
+// have an explicit threshold or a default "*" threshold is set.
+// The thresholds map uses filename as key (e.g., "stops.txt") with "*" as the default.
+// The error rate is calculated as (entity errors + reference errors) / total entities * 100.


The documentation states "The error rate is calculated as (entity errors + reference errors) / total entities * 100" but this is inaccurate. The actual calculation (line 201, 205) is: (entity errors + reference errors) / (entity count + entity errors + reference errors) * 100. The denominator includes both successful entities AND errors, not just total entities. Consider updating the documentation to: "The error rate is calculated as (entity errors + reference errors) / (successful entities + errors) * 100."

Suggested change

// The error rate is calculated as (entity errors + reference errors) / total entities * 100.

// The error rate is calculated as (entity errors + reference errors) / (successful entities + errors) * 100.

Copilot · 2025-12-05T02:27:17Z

validator/validator.go

+	// Check error threshold
+	if len(v.Options.ErrorThreshold) > 0 {
+		thresholdResult := cpResult.CheckErrorThreshold(v.Options.ErrorThreshold)
+		if !thresholdResult.OK {
+			var exceededFiles []string
+			for fn, detail := range thresholdResult.Details {
+				if !detail.OK {
+					log.For(context.TODO()).Error().Str("filename", fn).Float64("error_percent", detail.ErrorPercent).Float64("threshold", detail.Threshold).Int("error_count", detail.ErrorCount).Int("total_count", detail.TotalCount).Msg("file exceeded error threshold")
+					exceededFiles = append(exceededFiles, fn)
+				}
+			}
+			sort.Strings(exceededFiles)
+			var errMsgs []string
+			for _, fn := range exceededFiles {
+				detail := thresholdResult.Details[fn]
+				errMsgs = append(errMsgs, fmt.Sprintf("%s: %.2f%% errors (threshold: %.2f%%)", fn, detail.ErrorPercent, detail.Threshold))
+			}
+			result.FailureReason.Set(fmt.Sprintf("error threshold exceeded: %s", strings.Join(errMsgs, "; ")))
+			return result, nil
+		}
+	}
+
+	// Check required files have at least minimum entities
+	requiredMinEntities := map[string]int{"agency.txt": 1, "routes.txt": 1}
+	minEntitiesResult := cpResult.CheckRequiredMinEntities(requiredMinEntities)
+	if !minEntitiesResult.OK {
+		var failedFiles []string
+		for fn, detail := range minEntitiesResult.Details {
+			if !detail.OK {
+				log.For(context.TODO()).Error().Str("filename", fn).Int("total_count", detail.TotalCount).Int("required", detail.Required).Msg("file did not meet required minimum entities")
+				failedFiles = append(failedFiles, fn)
+			}
+		}
+		sort.Strings(failedFiles)
+		var errMsgs []string
+		for _, fn := range failedFiles {
+			detail := minEntitiesResult.Details[fn]
+			errMsgs = append(errMsgs, fmt.Sprintf("%s: %d entities (required: %d)", fn, detail.TotalCount, detail.Required))
+		}
+		result.FailureReason.Set(fmt.Sprintf("required minimum entities not met: %s", strings.Join(errMsgs, "; ")))
+		return result, nil
+	}


The error threshold and minimum entities checking logic (lines 245-286) is duplicated in importer/importer.go (lines 175-214). Both implementations follow the same pattern: iterate through failed files, log errors, sort filenames, build error messages, and return/set failure. Consider extracting this into a shared helper function to reduce duplication and ensure consistency. For example, a function like checkThresholdsAndMinimums(cpResult, errorThreshold, requiredMinEntities, ctx) that returns error messages or nil.

Copilot · 2025-12-05T02:27:18Z

importer/importer.go

+	}
+
+	// Check required files have at least minimum entities
+	requiredMinEntities := map[string]int{"agency.txt": 1, "routes.txt": 1}


The old required files check validated that agency.txt, routes.txt, and stops.txt all had at least one entity. The new CheckRequiredMinEntities only checks agency.txt and routes.txt, omitting stops.txt. This is a behavioral change that reduces validation strictness. If stops.txt is no longer required to have entities, this should be documented in the PR description. Otherwise, stops.txt should be added to the requiredMinEntities map.

irees added 2 commits December 4, 2025 17:49

Configurable maximum error threshold for success

58ea827

go generate

010e033

Copilot AI review requested due to automatic review settings December 5, 2025 01:50

Copilot started reviewing on behalf of irees December 5, 2025 01:51 View session

Copilot finished reviewing on behalf of irees December 5, 2025 01:53

Copilot AI reviewed Dec 5, 2025

View reviewed changes

cmds/cmds.go Show resolved Hide resolved

cmds/cmds.go Show resolved Hide resolved

cmds/cmds.go Show resolved Hide resolved

validator/validator.go Show resolved Hide resolved

importer/importer.go Show resolved Hide resolved

irees added 2 commits December 4, 2025 18:01

Additional checks

9a53583

Design changes

07435a8

irees requested a review from Copilot December 5, 2025 02:22

Copilot started reviewing on behalf of irees December 5, 2025 02:22 View session

Copilot finished reviewing on behalf of irees December 5, 2025 02:26

irees merged commit d92a7ea into main Dec 5, 2025
10 checks passed

irees deleted the configurable-error-limit branch December 5, 2025 02:26

Copilot AI reviewed Dec 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configurable error limit#531

Configurable error limit#531
irees merged 4 commits intomainfrom
configurable-error-limit

irees commented Dec 5, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 5, 2025

Uh oh!

Copilot AI Dec 5, 2025

Uh oh!

Copilot AI Dec 5, 2025

Uh oh!

Copilot AI Dec 5, 2025

Uh oh!

Copilot AI Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	// The error rate is calculated as (entity errors + reference errors) / total entities * 100.
	// The error rate is calculated as (entity errors + reference errors) / (successful entities + errors) * 100.

Conversation

irees commented Dec 5, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants