Skip to content

Conversation

@d-chambers
Copy link
Contributor

@d-chambers d-chambers commented Aug 21, 2025

Description

This PR fixes issue #514. The Prodml parser now checks the last element of the time array, while also trying to take into account the possibility that the time array may not be in microseconds (to make the parser more robust).

Checklist

I have (if applicable):

  • referenced the GitHub issue this PR closes.
  • documented the new feature with docstrings or appropriate doc page.
  • included a test. See testing guidelines.
  • your name has been added to the contributors page (docs/contributors.md).
  • added the "ready_for_review" tag once the PR is ready to be reviewed.

Summary by CodeRabbit

  • New Features

    • None
  • Bug Fixes

    • Improved robustness when reading ProdML files with inconsistent or drifting end-time metadata so time coordinates are accurate and files load reliably.
  • Tests

    • Added tests that simulate incorrect PartEndTime metadata to verify files still read correctly.
    • Introduced a fixture to patch sample data for end-time validation.
  • Documentation

    • Cleaned a module docstring by removing a vendor-specific statement.
  • Refactor

    • Streamlined internal time handling without changing the public API.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 21, 2025

Walkthrough

Refactors ProdML time handling to load RawDataTime into a local array, add drift-detection and re-computation using datetime64[us] when PartEndTime appears inaccurate, and tweaks attribute assembly. Removes one sentence from the module docstring. Adds a test that patches an HDF5 file’s PartEndTime and verifies reading succeeds. No public API changes.

Changes

Cohort / File(s) Summary
ProdML utils: time coordinate handling
dascore/io/prodml/utils.py
Added numpy as np. _get_time_coord now reads time_array, derives attrs from it, computes step from len(time_array)-1, and adds drift-correction: if small mismatch between computed max and last timestamp, cast time_array to datetime64[us] and recompute time_coord. Removed merging time attrs into info and assigned info["dims"] = dims via an intermediate dims variable.
ProdML package docs
dascore/io/prodml/__init__.py
Removed a descriptive sentence from the module docstring; no functional or API changes.
Tests for PartEndTime edge case
tests/test_io/test_prodml/test_prod_ml.py
Added h5py and get_coord imports. New fixture copies and patches prodml_2.0.h5, sets PartEndTime based on recalculated end time, and a test (test_issue_514) reads the patched file and asserts a dc.Patch is returned, covering the PartEndTime drift scenario.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor TestUser
  participant Reader as dc.read
  participant Utils as _get_time_coord
  participant H5 as HDF5 (ProdML)

  TestUser->>Reader: request read(file)
  Reader->>H5: open file, access Acquisition/Raw*/RawDataTime
  Reader->>Utils: compute time coordinate
  Utils->>H5: load time_array and attrs
  Utils->>Utils: compute step and initial time_coord
  alt small drift detected (0 < diff < 10)
    Utils->>Utils: cast time_array -> datetime64[us]
    Utils->>Utils: recompute time_coord via get_coord
    note right of Utils: Adjust PartEndTime-derived coordinate
  else no small drift
    Utils->Utils: keep initial time_coord
  end
  Utils-->>Reader: return time_coord
  Reader-->>TestUser: return Patch with coords
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I nibbled on timestamps, crunchy and fine,
A hop to the end time—was it out of line?
I twitched my ears, recalculated the trail,
Drift tamed softly, no chance to fail.
Now the patch reads true—thump-thump, hooray! 🥕⏱️

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 6955147 and 109222f.

📒 Files selected for processing (1)
  • dascore/io/prodml/utils.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • dascore/io/prodml/utils.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
  • GitHub Check: test_code (windows-latest, 3.10)
  • GitHub Check: test_code (windows-latest, 3.12)
  • GitHub Check: test_code (macos-latest, 3.12)
  • GitHub Check: test_code (ubuntu-latest, 3.10)
  • GitHub Check: test_code (ubuntu-latest, 3.11)
  • GitHub Check: test_code (macos-latest, 3.10)
  • GitHub Check: test_code (ubuntu-latest, 3.12)
  • GitHub Check: test_code (windows-latest, 3.11)
  • GitHub Check: test_code (macos-latest, 3.11)
  • GitHub Check: test_code_min_deps (macos-latest, 3.12)
  • GitHub Check: test_code_min_deps (windows-latest, 3.12)
  • GitHub Check: test_code_min_deps (macos-latest, 3.13)
  • GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
  • GitHub Check: test_code_min_deps (windows-latest, 3.13)
  • GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix_514

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@codecov
Copy link

codecov bot commented Aug 21, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.85%. Comparing base (2d2a7dd) to head (109222f).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #515   +/-   ##
=======================================
  Coverage   99.85%   99.85%           
=======================================
  Files         118      118           
  Lines        9713     9724   +11     
=======================================
+ Hits         9699     9710   +11     
  Misses         14       14           
Flag Coverage Δ
unittests 99.85% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
tests/test_io/test_prodml/test_prod_ml.py (3)

7-7: h5py import is fine; consider importorskip if environments vary.

If CI ever runs without h5py, prefer pytest.importorskip("h5py") to avoid hard failures. Not blocking if h5py is guaranteed in test deps.


58-73: Fix issue reference and misleading comment; optionally normalize ISO formatting.

  • Docstring says “See #412” but the PR addresses #514. Update to avoid confusion.
  • The inline comment says “monkey patch dimensions,” but this fixture patches PartEndTime. Update the comment.
  • Optional: use np.datetime_as_string(..., unit="us") to ensure a consistent ISO-8601 string; some files expect precise formatting.

Apply this diff:

-    def issue_514_patch_path(self, tmp_path_factory):
-        """Make a patch with bad endtime metadata. See #412."""
+    def issue_514_patch_path(self, tmp_path_factory):
+        """Make a patch with bad end-time metadata. See #514."""
@@
-            # monkey patch dimensions to simulate issue.
+            # Monkey-patch end time to simulate bad PartEndTime metadata.
@@
-            new_time = str(time_coord.max() + time_coord.step * 2)
+            new_time = np.datetime_as_string(
+                time_coord.max() + time_coord.step * 2, unit="us"
+            )

Note: If you adopt np.datetime_as_string, add import numpy as np at the top of this file.


97-101: Test coverage LGTM; consider asserting behavior, not just type.

The test currently asserts the read returns a Patch. Consider also asserting the derived time coordinate matches the last element of RawDataTime (after your fix), or at least that time length matches RawDataTime length, to prove the end-time correction works.

Example follow-up assertion:

time_array = dc.read(issue_514_patch_path)[0].coords.get_array("time")
assert time_array[-1] <= time_array.max()  # or a stricter equality if appropriate
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2d2a7dd and 6955147.

📒 Files selected for processing (3)
  • dascore/io/prodml/__init__.py (0 hunks)
  • dascore/io/prodml/utils.py (3 hunks)
  • tests/test_io/test_prodml/test_prod_ml.py (3 hunks)
💤 Files with no reviewable changes (1)
  • dascore/io/prodml/init.py
🧰 Additional context used
🧬 Code graph analysis (2)
tests/test_io/test_prodml/test_prod_ml.py (3)
dascore/core/coords.py (3)
  • get_coord (1450-1638)
  • data (589-591)
  • max (442-444)
dascore/utils/downloader.py (1)
  • fetch (40-58)
dascore/io/core.py (2)
  • read (469-478)
  • read (584-648)
dascore/io/prodml/utils.py (3)
dascore/utils/misc.py (1)
  • unbyte (357-361)
dascore/core/coords.py (3)
  • get_coord (1450-1638)
  • max (442-444)
  • data (589-591)
dascore/io/dasdae/utils.py (1)
  • _get_dims (145-152)
🔇 Additional comments (3)
dascore/io/prodml/utils.py (2)

5-6: Import of NumPy looks appropriate.

Using NumPy here is justified for datetime arithmetic and array handling.


127-129: No changes needed: dims is a valid attribute on Patch metadata models

  • The PatchAttrs Pydantic model defines a dims field (type CommaSeparatedStr) in its schema, so passing dims via attr_cls(**info) is supported as documented in the Patch tutorial (dascore.org).
  • The file‐indexing model PatchFileSummary likewise accepts a dims argument (e.g. in tests: PatchFileSummary(d_time=10, dims="time,distance")), so adding info["dims"] = dims will not raise a TypeError.
tests/test_io/test_prodml/test_prod_ml.py (1)

13-13: Importing get_coord here is appropriate.

The test uses get_coord solely to synthesize a plausible end time; looks good.

Comment on lines 65 to 72
time_array = node["RawDataTime"]
time_attrs = time_array.attrs
start_str = unbyte(time_attrs["PartStartTime"]).split("+")[0]
start = dc.to_datetime64(start_str.rstrip("Z"))
end_str = unbyte(time_attrs["PartEndTime"]).split("+")[0]
end = dc.to_datetime64(end_str.rstrip("Z"))
step = (end - start) / (len(node["RawDataTime"]) - 1)
return get_coord(start=start, stop=end + step, step=step, units="s")
step = (end - start) / (len(time_array) - 1)
time_coord = get_coord(start=start, stop=end + step, step=step, units="s")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Guard against len==0/1 and compute step robustly.

If RawDataTime has 0 or 1 samples, (len(time_array) - 1) yields 0 or -1 causing incorrect step or division by zero later. Return a coord derived from the array in that degenerate case before computing step.

Apply this diff:

-    time_array = node["RawDataTime"]
-    time_attrs = time_array.attrs
+    time_array = node["RawDataTime"]
+    time_attrs = time_array.attrs
@@
-    step = (end - start) / (len(time_array) - 1)
-    time_coord = get_coord(start=start, stop=end + step, step=step, units="s")
+    n = len(time_array)
+    if n < 2:
+        # Degenerate case: single timestamp or empty -> trust the stored array.
+        return get_coord(data=time_array[:].astype("datetime64[us]"))
+    step = (end - start) / (n - 1)
+    time_coord = get_coord(start=start, stop=end + step, step=step, units="s")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
time_array = node["RawDataTime"]
time_attrs = time_array.attrs
start_str = unbyte(time_attrs["PartStartTime"]).split("+")[0]
start = dc.to_datetime64(start_str.rstrip("Z"))
end_str = unbyte(time_attrs["PartEndTime"]).split("+")[0]
end = dc.to_datetime64(end_str.rstrip("Z"))
step = (end - start) / (len(node["RawDataTime"]) - 1)
return get_coord(start=start, stop=end + step, step=step, units="s")
step = (end - start) / (len(time_array) - 1)
time_coord = get_coord(start=start, stop=end + step, step=step, units="s")
time_array = node["RawDataTime"]
time_attrs = time_array.attrs
start_str = unbyte(time_attrs["PartStartTime"]).split("+")[0]
start = dc.to_datetime64(start_str.rstrip("Z"))
end_str = unbyte(time_attrs["PartEndTime"]).split("+")[0]
end = dc.to_datetime64(end_str.rstrip("Z"))
n = len(time_array)
if n < 2:
# Degenerate case: single timestamp or empty -> trust the stored array.
return get_coord(data=time_array[:].astype("datetime64[us]"))
step = (end - start) / (n - 1)
time_coord = get_coord(start=start, stop=end + step, step=step, units="s")
🤖 Prompt for AI Agents
In dascore/io/prodml/utils.py around lines 65 to 72, guard against RawDataTime
having length 0 or 1 by checking len(time_array) early and returning a
coordinate built directly from the values (or a single-point coord) when
degenerate instead of computing a step that divides by (len-1); otherwise
compute step as (end - start) / max(1, (len(time_array) - 1)) or explicitly
handle the normal case only when len > 1, then call get_coord with start,
stop=end + step, step and units="s" so you never divide by zero or produce an
incorrect negative step.

@d-chambers
Copy link
Contributor Author

Note: I did some ad-hoc profiling to make sure these changes don't slow down the ProdML parser and the difference appears to be negligible.

@d-chambers d-chambers added the bug Something isn't working label Aug 22, 2025
@d-chambers d-chambers merged commit 72c7878 into master Aug 22, 2025
21 checks passed
@d-chambers d-chambers deleted the fix_514 branch August 22, 2025 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants