fix_514 #515

d-chambers · 2025-08-21T19:34:00Z

Description

This PR fixes issue #514. The Prodml parser now checks the last element of the time array, while also trying to take into account the possibility that the time array may not be in microseconds (to make the parser more robust).

Checklist

I have (if applicable):

referenced the GitHub issue this PR closes.
documented the new feature with docstrings or appropriate doc page.
included a test. See testing guidelines.
your name has been added to the contributors page (docs/contributors.md).
added the "ready_for_review" tag once the PR is ready to be reviewed.

Summary by CodeRabbit

New Features
- None
Bug Fixes
- Improved robustness when reading ProdML files with inconsistent or drifting end-time metadata so time coordinates are accurate and files load reliably.
Tests
- Added tests that simulate incorrect PartEndTime metadata to verify files still read correctly.
- Introduced a fixture to patch sample data for end-time validation.
Documentation
- Cleaned a module docstring by removing a vendor-specific statement.
Refactor
- Streamlined internal time handling without changing the public API.

coderabbitai · 2025-08-21T19:34:08Z

Walkthrough

Refactors ProdML time handling to load RawDataTime into a local array, add drift-detection and re-computation using datetime64[us] when PartEndTime appears inaccurate, and tweaks attribute assembly. Removes one sentence from the module docstring. Adds a test that patches an HDF5 file’s PartEndTime and verifies reading succeeds. No public API changes.

Changes

Cohort / File(s)	Summary
ProdML utils: time coordinate handling `dascore/io/prodml/utils.py`	Added `numpy as np`. _get_time_coord now reads `time_array`, derives attrs from it, computes step from `len(time_array)-1`, and adds drift-correction: if small mismatch between computed max and last timestamp, cast `time_array` to `datetime64[us]` and recompute time_coord. Removed merging time attrs into info and assigned `info["dims"] = dims` via an intermediate `dims` variable.
ProdML package docs `dascore/io/prodml/__init__.py`	Removed a descriptive sentence from the module docstring; no functional or API changes.
Tests for PartEndTime edge case `tests/test_io/test_prodml/test_prod_ml.py`	Added `h5py` and `get_coord` imports. New fixture copies and patches `prodml_2.0.h5`, sets `PartEndTime` based on recalculated end time, and a test (`test_issue_514`) reads the patched file and asserts a `dc.Patch` is returned, covering the PartEndTime drift scenario.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor TestUser
  participant Reader as dc.read
  participant Utils as _get_time_coord
  participant H5 as HDF5 (ProdML)

  TestUser->>Reader: request read(file)
  Reader->>H5: open file, access Acquisition/Raw*/RawDataTime
  Reader->>Utils: compute time coordinate
  Utils->>H5: load time_array and attrs
  Utils->>Utils: compute step and initial time_coord
  alt small drift detected (0 < diff < 10)
    Utils->>Utils: cast time_array -> datetime64[us]
    Utils->>Utils: recompute time_coord via get_coord
    note right of Utils: Adjust PartEndTime-derived coordinate
  else no small drift
    Utils->Utils: keep initial time_coord
  end
  Utils-->>Reader: return time_coord
  Reader-->>TestUser: return Patch with coords

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I nibbled on timestamps, crunchy and fine,
A hop to the end time—was it out of line?
I twitched my ears, recalculated the trail,
Drift tamed softly, no chance to fail.
Now the patch reads true—thump-thump, hooray! 🥕⏱️

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 6955147 and 109222f.

📒 Files selected for processing (1)

dascore/io/prodml/utils.py (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

dascore/io/prodml/utils.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)

GitHub Check: test_code (windows-latest, 3.10)
GitHub Check: test_code (windows-latest, 3.12)
GitHub Check: test_code (macos-latest, 3.12)
GitHub Check: test_code (ubuntu-latest, 3.10)
GitHub Check: test_code (ubuntu-latest, 3.11)
GitHub Check: test_code (macos-latest, 3.10)
GitHub Check: test_code (ubuntu-latest, 3.12)
GitHub Check: test_code (windows-latest, 3.11)
GitHub Check: test_code (macos-latest, 3.11)
GitHub Check: test_code_min_deps (macos-latest, 3.12)
GitHub Check: test_code_min_deps (windows-latest, 3.12)
GitHub Check: test_code_min_deps (macos-latest, 3.13)
GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
GitHub Check: test_code_min_deps (windows-latest, 3.13)
GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix_514

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

codecov · 2025-08-21T19:39:54Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.85%. Comparing base (2d2a7dd) to head (109222f).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master     #515   +/-   ##
=======================================
  Coverage   99.85%   99.85%           
=======================================
  Files         118      118           
  Lines        9713     9724   +11     
=======================================
+ Hits         9699     9710   +11     
  Misses         14       14

Flag	Coverage Δ
unittests	`99.85% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (3)

tests/test_io/test_prodml/test_prod_ml.py (3)
7-7: h5py import is fine; consider importorskip if environments vary.

If CI ever runs without h5py, prefer pytest.importorskip("h5py") to avoid hard failures. Not blocking if h5py is guaranteed in test deps.

58-73: Fix issue reference and misleading comment; optionally normalize ISO formatting.

Docstring says “See #412” but the PR addresses #514. Update to avoid confusion.

The inline comment says “monkey patch dimensions,” but this fixture patches PartEndTime. Update the comment.

Optional: use np.datetime_as_string(..., unit="us") to ensure a consistent ISO-8601 string; some files expect precise formatting.

Apply this diff:
-    def issue_514_patch_path(self, tmp_path_factory):
-        """Make a patch with bad endtime metadata. See #412."""
+    def issue_514_patch_path(self, tmp_path_factory):
+        """Make a patch with bad end-time metadata. See #514."""
@@
-            # monkey patch dimensions to simulate issue.
+            # Monkey-patch end time to simulate bad PartEndTime metadata.
@@
-            new_time = str(time_coord.max() + time_coord.step * 2)
+            new_time = np.datetime_as_string(
+                time_coord.max() + time_coord.step * 2, unit="us"
+            )
Note: If you adopt np.datetime_as_string, add import numpy as np at the top of this file.

97-101: Test coverage LGTM; consider asserting behavior, not just type.

The test currently asserts the read returns a Patch. Consider also asserting the derived time coordinate matches the last element of RawDataTime (after your fix), or at least that time length matches RawDataTime length, to prove the end-time correction works.

Example follow-up assertion:
time_array = dc.read(issue_514_patch_path)[0].coords.get_array("time")
assert time_array[-1] <= time_array.max()  # or a stricter equality if appropriate

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2d2a7dd and 6955147.

📒 Files selected for processing (3)

dascore/io/prodml/__init__.py (0 hunks)
dascore/io/prodml/utils.py (3 hunks)
tests/test_io/test_prodml/test_prod_ml.py (3 hunks)

💤 Files with no reviewable changes (1)

dascore/io/prodml/init.py

🧰 Additional context used

🧬 Code graph analysis (2)

tests/test_io/test_prodml/test_prod_ml.py (3)

dascore/core/coords.py (3)

get_coord (1450-1638)

data (589-591)

max (442-444)

dascore/utils/downloader.py (1)

fetch (40-58)

dascore/io/core.py (2)

read (469-478)

read (584-648)

dascore/io/prodml/utils.py (3)

dascore/utils/misc.py (1)

unbyte (357-361)

dascore/core/coords.py (3)

get_coord (1450-1638)

max (442-444)

data (589-591)

dascore/io/dasdae/utils.py (1)

_get_dims (145-152)

🔇 Additional comments (3)

dascore/io/prodml/utils.py (2)

5-6: Import of NumPy looks appropriate.

Using NumPy here is justified for datetime arithmetic and array handling.

127-129: No changes needed: dims is a valid attribute on Patch metadata models

The PatchAttrs Pydantic model defines a dims field (type CommaSeparatedStr) in its schema, so passing dims via attr_cls(**info) is supported as documented in the Patch tutorial (dascore.org).

The file‐indexing model PatchFileSummary likewise accepts a dims argument (e.g. in tests: PatchFileSummary(d_time=10, dims="time,distance")), so adding info["dims"] = dims will not raise a TypeError.

tests/test_io/test_prodml/test_prod_ml.py (1)

13-13: Importing get_coord here is appropriate.

The test uses get_coord solely to synthesize a plausible end time; looks good.

coderabbitai · 2025-08-21T20:01:42Z

dascore/io/prodml/utils.py

+    time_array = node["RawDataTime"]
+    time_attrs = time_array.attrs
    start_str = unbyte(time_attrs["PartStartTime"]).split("+")[0]
    start = dc.to_datetime64(start_str.rstrip("Z"))
    end_str = unbyte(time_attrs["PartEndTime"]).split("+")[0]
    end = dc.to_datetime64(end_str.rstrip("Z"))
-    step = (end - start) / (len(node["RawDataTime"]) - 1)
-    return get_coord(start=start, stop=end + step, step=step, units="s")
+    step = (end - start) / (len(time_array) - 1)
+    time_coord = get_coord(start=start, stop=end + step, step=step, units="s")


⚠️ Potential issue

Guard against len==0/1 and compute step robustly.

If RawDataTime has 0 or 1 samples, (len(time_array) - 1) yields 0 or -1 causing incorrect step or division by zero later. Return a coord derived from the array in that degenerate case before computing step.

Apply this diff:

- time_array = node["RawDataTime"] - time_attrs = time_array.attrs + time_array = node["RawDataTime"] + time_attrs = time_array.attrs @@ - step = (end - start) / (len(time_array) - 1) - time_coord = get_coord(start=start, stop=end + step, step=step, units="s") + n = len(time_array) + if n < 2: + # Degenerate case: single timestamp or empty -> trust the stored array. + return get_coord(data=time_array[:].astype("datetime64[us]")) + step = (end - start) / (n - 1) + time_coord = get_coord(start=start, stop=end + step, step=step, units="s")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

time_array = node["RawDataTime"]

time_attrs = time_array.attrs

start_str = unbyte(time_attrs["PartStartTime"]).split("+")[0]

start = dc.to_datetime64(start_str.rstrip("Z"))

end_str = unbyte(time_attrs["PartEndTime"]).split("+")[0]

end = dc.to_datetime64(end_str.rstrip("Z"))

step = (end - start) / (len(node["RawDataTime"]) - 1)

return get_coord(start=start, stop=end + step, step=step, units="s")

step = (end - start) / (len(time_array) - 1)

time_coord = get_coord(start=start, stop=end + step, step=step, units="s")

time_array = node["RawDataTime"]

time_attrs = time_array.attrs

start_str = unbyte(time_attrs["PartStartTime"]).split("+")[0]

start = dc.to_datetime64(start_str.rstrip("Z"))

end_str = unbyte(time_attrs["PartEndTime"]).split("+")[0]

end = dc.to_datetime64(end_str.rstrip("Z"))

n = len(time_array)

if n < 2:

# Degenerate case: single timestamp or empty -> trust the stored array.

return get_coord(data=time_array[:].astype("datetime64[us]"))

step = (end - start) / (n - 1)

time_coord = get_coord(start=start, stop=end + step, step=step, units="s")

🤖 Prompt for AI Agents

In dascore/io/prodml/utils.py around lines 65 to 72, guard against RawDataTime having length 0 or 1 by checking len(time_array) early and returning a coordinate built directly from the values (or a single-point coord) when degenerate instead of computing a step that divides by (len-1); otherwise compute step as (end - start) / max(1, (len(time_array) - 1)) or explicitly handle the normal case only when len > 1, then call get_coord with start, stop=end + step, step and units="s" so you never divide by zero or produce an incorrect negative step.

dascore/io/prodml/utils.py

d-chambers · 2025-08-22T08:45:59Z

Note: I did some ad-hoc profiling to make sure these changes don't slow down the ProdML parser and the difference appears to be negligible.

fix_514

60295d1

clean up

6955147

coderabbitai bot reviewed Aug 21, 2025

View reviewed changes

incorporate review

109222f

d-chambers added the bug Something isn't working label Aug 22, 2025

d-chambers merged commit 72c7878 into master Aug 22, 2025
21 checks passed

d-chambers deleted the fix_514 branch August 22, 2025 14:41

d-chambers mentioned this pull request Aug 23, 2025

Issue when prodml PartEndTime is wrong #514

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix_514 #515

fix_514 #515

Uh oh!

d-chambers commented Aug 21, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Aug 21, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Status, Documentation and Community

Uh oh!

codecov bot commented Aug 21, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Aug 21, 2025

Uh oh!

Uh oh!

d-chambers commented Aug 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix_514 #515

fix_514 #515

Uh oh!

Conversation

d-chambers commented Aug 21, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

codecov bot commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

d-chambers commented Aug 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

d-chambers commented Aug 21, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 21, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

codecov bot commented Aug 21, 2025 •

edited

Loading