Fix: attach data_quality_issues for list payloads in wrapper validation by emefienem · Pull Request #1151 · Tracer-Cloud/opensre

emefienem · 2026-04-30T16:37:28Z

Describe the changes you have made in this PR -

This PR fixes a bug in validate_host_metrics() where list-based payloads
did not properly aggregate and return data_quality_issues at the root level.

Changes made:

Updated list handling logic in validate_host_metrics
Ensured each list item is fully validated using validate_metrics
Aggregated all data_quality_issues from list items into root response
Ensured existing flat and nested structure behavior remains unchanged

Demo/Screenshot for feature changes and bug fixes -

Code Understanding and AI Usage

Did you use AI assistance (ChatGPT, Claude, Copilot, etc.) to write any part of this code?

No, I wrote all the code myself
[] Yes, I used AI assistance (continue below)

If you used AI assistance:

[] I have reviewed every single line of the AI-generated code
[] I can explain the purpose and logic of each function/component I added
I have tested edge cases and understand how the code handles them
I have modified the AI output to follow this project's coding standards and conventions

Explain your implementation approach:

Checklist before requesting a review

I have added proper PR title and linked to the issue
I have performed a self-review of my code
I can explain the purpose of every function, class, and logic block I added
I understand why my changes work and have tested them thoroughly
I have considered potential edge cases and how my code handles them
If it is a core feature, I have added thorough tests
My code follows the project's style guidelines and conventions

Note: Please check Allow edits from maintainers if you would like us to assist in the PR.

greptile-apps · 2026-04-30T16:40:24Z

Greptile Summary

This PR fixes the bug in validate_host_metrics where data_quality_issues were never surfaced at the root level for list-based payloads (previously collected by iterating validated_data after _validate_flat_metrics, which never writes that key to individual points).

P1 — structural regression in list items: Switching to validator.validate_metrics(data_point.copy()) for each flat data point triggers _validate_cpu_metric and _validate_disk_metric on scalar values, wrapping cpu and disk with {"raw": ..., "validated": False} dicts instead of preserving the original integers. A simpler fix would be to keep _validate_flat_metrics per item and collect issues from validator.issues after each call, avoiding the double-dispatch into the nested-structure branches.
P1 — test gap: test_wrapper_list_structure_payload only asserts "data_quality_issues" in result and does not assert the preserved scalar types of cpu/disk inside result["data"], so the regression passes undetected.

Confidence Score: 3/5

Not safe to merge as-is — the fix introduces a structural regression that changes the shape of every validated list item.

Two P1 findings: the switch to validate_metrics on individual flat items silently replaces scalar cpu/disk values with {"raw": ..., "validated": False} dicts, and the new test only checks for the presence of data_quality_issues rather than verifying the unchanged shape of the data items, leaving the regression undetected.

app/tools/utils/data_validation.py (line 398) and tests/tools/utils/test_data_validation.py (lines 129–138) both need attention before merging.

Important Files Changed

Filename	Overview
app/tools/utils/data_validation.py	The switch from `_validate_flat_metrics` to `validate_metrics` for each list item correctly aggregates `data_quality_issues`, but `validate_metrics` also runs `_validate_cpu_metric`/`_validate_disk_metric` on scalar values, wrapping them in `{"raw": ..., "validated": False}` dicts and silently changing the data shape of every list item.
tests/tools/utils/test_data_validation.py	Removes the `xfail` marker from `test_wrapper_list_structure_payload`, but the test only asserts that `data_quality_issues` is present, leaving the structural regression in individual data-point fields uncaught.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["validate_host_metrics(metrics)"] --> B{Is metrics a dict?}
    B -- No --> C["Return error with data_quality_issues"]
    B -- Yes --> D{"'data' key present and is a list?"}
    D -- No --> E["validator.validate_metrics(metrics) — flat/nested path"]
    D -- Yes --> F["Loop over data points"]
    F --> G{Is data_point a dict?}
    G -- No --> H["Append as-is"]
    G -- Yes --> I["validator.validate_metrics(data_point) ⚠️ wraps scalar cpu/disk"]
    I --> J{"data_quality_issues in validated_point?"}
    J -- Yes --> K["Extend all_issues; del from validated_point"]
    J -- No --> L["Skip"]
    K --> M["Append validated_point"]
    L --> M
    H --> M
    M --> F
    F --> N["result = metrics.copy(); result['data'] = validated_data"]
    N --> O{any all_issues?}
    O -- Yes --> P["result['data_quality_issues'] = all_issues"]
    O -- No --> Q["Return result"]
    P --> Q

Comments Outside Diff (1)

tests/tools/utils/test_data_validation.py, line 129-138 (link)

Test coverage is too narrow to catch the structural regression

test_wrapper_list_structure_payload only asserts "data_quality_issues" in result. It does not verify the shape of the individual data points inside result["data"] — in particular that cpu and disk remain scalars and are not wrapped in {"raw": ..., "validated": False} dicts. Adding assertions such as isinstance(result["data"][0]["cpu"], (int, float)) and isinstance(result["data"][0]["disk"], (int, float)) would have caught the regression introduced by the change on line 398.

_{Reviews (1): Last reviewed commit: "Fix: attach data_quality_issues for list..." | Re-trigger Greptile}

greptile-apps · 2026-04-30T16:40:29Z

            if isinstance(data_point, dict):
-                # Validate each data point
-                validated_point = validator._validate_flat_metrics(data_point.copy())
+                validated_point = validator.validate_metrics(data_point.copy())


validate_metrics mutates scalar cpu/disk in each list item

Switching from _validate_flat_metrics to validate_metrics fixes issue aggregation, but introduces a structural regression. validate_metrics runs _validate_cpu_metric and _validate_disk_metric on any key named "cpu" or "disk", and both methods return {"raw": <value>, "validated": False} when the input is not a dict (i.e. a plain integer). For a flat data point like {"cpu": 95, "ram": 8589934592, "disk": 50}, the returned point inside validated_data will have cpu and disk replaced with nested dicts instead of their original scalar values — silently changing the shape of every list item in the response.

muddlebee · 2026-04-30T16:55:20Z

@emefienem review points, CI checks

VaibhavUpreti

awesome @emefienem merging this for velocity!

github-actions · 2026-04-30T19:36:53Z

😤 @emefienem said "I will fix this" and then actually fixed it. Legendary behavior.

👋 Join us on Discord - OpenSRE : hang out, contribute, or hunt for features and issues. Everyone's welcome.

emefienem · 2026-05-01T08:15:32Z

awesome @emefienem merging this for velocity!

🫡

Fix: attach data_quality_issues for list payloads in wrapper validation

ba292ae

greptile-apps Bot reviewed Apr 30, 2026

View reviewed changes

VaibhavUpreti approved these changes Apr 30, 2026

View reviewed changes

VaibhavUpreti merged commit bee6d61 into Tracer-Cloud:main Apr 30, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: attach data_quality_issues for list payloads in wrapper validation#1151

Fix: attach data_quality_issues for list payloads in wrapper validation#1151
VaibhavUpreti merged 1 commit intoTracer-Cloud:mainfrom
emefienem:fix-list-aggregation-bug

emefienem commented Apr 30, 2026

Uh oh!

greptile-apps Bot commented Apr 30, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot Apr 30, 2026

Uh oh!

muddlebee commented Apr 30, 2026

Uh oh!

VaibhavUpreti left a comment

Uh oh!

Uh oh!

github-actions Bot commented Apr 30, 2026

Uh oh!

emefienem commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

emefienem commented Apr 30, 2026

Describe the changes you have made in this PR -

Changes made:

Demo/Screenshot for feature changes and bug fixes -

Code Understanding and AI Usage

Checklist before requesting a review

Uh oh!

greptile-apps Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Flowchart

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

muddlebee commented Apr 30, 2026

Uh oh!

VaibhavUpreti left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Apr 30, 2026

Uh oh!

emefienem commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

greptile-apps Bot commented Apr 30, 2026 •

edited

Loading