-
Notifications
You must be signed in to change notification settings - Fork 28
fix_514 #515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix_514 #515
Conversation
WalkthroughRefactors ProdML time handling to load RawDataTime into a local array, add drift-detection and re-computation using datetime64[us] when PartEndTime appears inaccurate, and tweaks attribute assembly. Removes one sentence from the module docstring. Adds a test that patches an HDF5 file’s PartEndTime and verifies reading succeeds. No public API changes. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor TestUser
participant Reader as dc.read
participant Utils as _get_time_coord
participant H5 as HDF5 (ProdML)
TestUser->>Reader: request read(file)
Reader->>H5: open file, access Acquisition/Raw*/RawDataTime
Reader->>Utils: compute time coordinate
Utils->>H5: load time_array and attrs
Utils->>Utils: compute step and initial time_coord
alt small drift detected (0 < diff < 10)
Utils->>Utils: cast time_array -> datetime64[us]
Utils->>Utils: recompute time_coord via get_coord
note right of Utils: Adjust PartEndTime-derived coordinate
else no small drift
Utils->Utils: keep initial time_coord
end
Utils-->>Reader: return time_coord
Reader-->>TestUser: return Patch with coords
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. 📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #515 +/- ##
=======================================
Coverage 99.85% 99.85%
=======================================
Files 118 118
Lines 9713 9724 +11
=======================================
+ Hits 9699 9710 +11
Misses 14 14
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (3)
tests/test_io/test_prodml/test_prod_ml.py (3)
7-7: h5py import is fine; consider importorskip if environments vary.If CI ever runs without h5py, prefer
pytest.importorskip("h5py")to avoid hard failures. Not blocking if h5py is guaranteed in test deps.
58-73: Fix issue reference and misleading comment; optionally normalize ISO formatting.
- Docstring says “See #412” but the PR addresses #514. Update to avoid confusion.
- The inline comment says “monkey patch dimensions,” but this fixture patches PartEndTime. Update the comment.
- Optional: use
np.datetime_as_string(..., unit="us")to ensure a consistent ISO-8601 string; some files expect precise formatting.Apply this diff:
- def issue_514_patch_path(self, tmp_path_factory): - """Make a patch with bad endtime metadata. See #412.""" + def issue_514_patch_path(self, tmp_path_factory): + """Make a patch with bad end-time metadata. See #514.""" @@ - # monkey patch dimensions to simulate issue. + # Monkey-patch end time to simulate bad PartEndTime metadata. @@ - new_time = str(time_coord.max() + time_coord.step * 2) + new_time = np.datetime_as_string( + time_coord.max() + time_coord.step * 2, unit="us" + )Note: If you adopt
np.datetime_as_string, addimport numpy as npat the top of this file.
97-101: Test coverage LGTM; consider asserting behavior, not just type.The test currently asserts the read returns a Patch. Consider also asserting the derived time coordinate matches the last element of RawDataTime (after your fix), or at least that time length matches RawDataTime length, to prove the end-time correction works.
Example follow-up assertion:
time_array = dc.read(issue_514_patch_path)[0].coords.get_array("time") assert time_array[-1] <= time_array.max() # or a stricter equality if appropriate
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
dascore/io/prodml/__init__.py(0 hunks)dascore/io/prodml/utils.py(3 hunks)tests/test_io/test_prodml/test_prod_ml.py(3 hunks)
💤 Files with no reviewable changes (1)
- dascore/io/prodml/init.py
🧰 Additional context used
🧬 Code graph analysis (2)
tests/test_io/test_prodml/test_prod_ml.py (3)
dascore/core/coords.py (3)
get_coord(1450-1638)data(589-591)max(442-444)dascore/utils/downloader.py (1)
fetch(40-58)dascore/io/core.py (2)
read(469-478)read(584-648)
dascore/io/prodml/utils.py (3)
dascore/utils/misc.py (1)
unbyte(357-361)dascore/core/coords.py (3)
get_coord(1450-1638)max(442-444)data(589-591)dascore/io/dasdae/utils.py (1)
_get_dims(145-152)
🔇 Additional comments (3)
dascore/io/prodml/utils.py (2)
5-6: Import of NumPy looks appropriate.Using NumPy here is justified for datetime arithmetic and array handling.
127-129: No changes needed:dimsis a valid attribute on Patch metadata models
- The
PatchAttrsPydantic model defines adimsfield (typeCommaSeparatedStr) in its schema, so passingdimsviaattr_cls(**info)is supported as documented in the Patch tutorial (dascore.org).- The file‐indexing model
PatchFileSummarylikewise accepts adimsargument (e.g. in tests:PatchFileSummary(d_time=10, dims="time,distance")), so addinginfo["dims"] = dimswill not raise aTypeError.tests/test_io/test_prodml/test_prod_ml.py (1)
13-13: Importing get_coord here is appropriate.The test uses get_coord solely to synthesize a plausible end time; looks good.
| time_array = node["RawDataTime"] | ||
| time_attrs = time_array.attrs | ||
| start_str = unbyte(time_attrs["PartStartTime"]).split("+")[0] | ||
| start = dc.to_datetime64(start_str.rstrip("Z")) | ||
| end_str = unbyte(time_attrs["PartEndTime"]).split("+")[0] | ||
| end = dc.to_datetime64(end_str.rstrip("Z")) | ||
| step = (end - start) / (len(node["RawDataTime"]) - 1) | ||
| return get_coord(start=start, stop=end + step, step=step, units="s") | ||
| step = (end - start) / (len(time_array) - 1) | ||
| time_coord = get_coord(start=start, stop=end + step, step=step, units="s") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard against len==0/1 and compute step robustly.
If RawDataTime has 0 or 1 samples, (len(time_array) - 1) yields 0 or -1 causing incorrect step or division by zero later. Return a coord derived from the array in that degenerate case before computing step.
Apply this diff:
- time_array = node["RawDataTime"]
- time_attrs = time_array.attrs
+ time_array = node["RawDataTime"]
+ time_attrs = time_array.attrs
@@
- step = (end - start) / (len(time_array) - 1)
- time_coord = get_coord(start=start, stop=end + step, step=step, units="s")
+ n = len(time_array)
+ if n < 2:
+ # Degenerate case: single timestamp or empty -> trust the stored array.
+ return get_coord(data=time_array[:].astype("datetime64[us]"))
+ step = (end - start) / (n - 1)
+ time_coord = get_coord(start=start, stop=end + step, step=step, units="s")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| time_array = node["RawDataTime"] | |
| time_attrs = time_array.attrs | |
| start_str = unbyte(time_attrs["PartStartTime"]).split("+")[0] | |
| start = dc.to_datetime64(start_str.rstrip("Z")) | |
| end_str = unbyte(time_attrs["PartEndTime"]).split("+")[0] | |
| end = dc.to_datetime64(end_str.rstrip("Z")) | |
| step = (end - start) / (len(node["RawDataTime"]) - 1) | |
| return get_coord(start=start, stop=end + step, step=step, units="s") | |
| step = (end - start) / (len(time_array) - 1) | |
| time_coord = get_coord(start=start, stop=end + step, step=step, units="s") | |
| time_array = node["RawDataTime"] | |
| time_attrs = time_array.attrs | |
| start_str = unbyte(time_attrs["PartStartTime"]).split("+")[0] | |
| start = dc.to_datetime64(start_str.rstrip("Z")) | |
| end_str = unbyte(time_attrs["PartEndTime"]).split("+")[0] | |
| end = dc.to_datetime64(end_str.rstrip("Z")) | |
| n = len(time_array) | |
| if n < 2: | |
| # Degenerate case: single timestamp or empty -> trust the stored array. | |
| return get_coord(data=time_array[:].astype("datetime64[us]")) | |
| step = (end - start) / (n - 1) | |
| time_coord = get_coord(start=start, stop=end + step, step=step, units="s") |
🤖 Prompt for AI Agents
In dascore/io/prodml/utils.py around lines 65 to 72, guard against RawDataTime
having length 0 or 1 by checking len(time_array) early and returning a
coordinate built directly from the values (or a single-point coord) when
degenerate instead of computing a step that divides by (len-1); otherwise
compute step as (end - start) / max(1, (len(time_array) - 1)) or explicitly
handle the normal case only when len > 1, then call get_coord with start,
stop=end + step, step and units="s" so you never divide by zero or produce an
incorrect negative step.
|
Note: I did some ad-hoc profiling to make sure these changes don't slow down the ProdML parser and the difference appears to be negligible. |
Description
This PR fixes issue #514. The Prodml parser now checks the last element of the time array, while also trying to take into account the possibility that the time array may not be in microseconds (to make the parser more robust).
Checklist
I have (if applicable):
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Documentation
Refactor