-
Notifications
You must be signed in to change notification settings - Fork 28
warning for chunking gappy spools #538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughDocstring for BaseSpool.chunk clarified. ChunkManager now emits a UserWarning when a gap is detected but within the configured tolerance (message contains a literal "{self._tolerance}"). Tests updated to expect this warning when chunking gappy data. No API or behavioral logic changes. Changes
✨ Finishing touches
🧪 Generate unit tests
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
🔇 Additional comments (1)
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Pre-merge checks✅ Passed checks (3 passed)
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #538 +/- ##
=======================================
Coverage 99.88% 99.88%
=======================================
Files 122 122
Lines 10044 10048 +4
=======================================
+ Hits 10032 10036 +4
Misses 12 12
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
dascore/core/spool.py (1)
135-136: Clarify tolerance units; it's a multiple of the sampling interval, not a count of samples.Current phrasing suggests an integer “number of samples.” In practice, tolerance is a scalar multiplier applied to the sampling step (can be non‑integer). Recommend rewording for accuracy.
Apply this diff:
- The maximum number of samples a block of data can be spaced (gap) and - still be considered contiguous. + The maximum multiple of the sampling interval that a gap may be and + still be considered contiguous (e.g., 1.5 allows gaps up to 1.5 times + the step along the chunking dimension).tests/test_utils/test_chunk.py (1)
261-263: Good: test now asserts a warning on gappy input.This covers the new behavior.
Also consider asserting the same warning for the unordered gappy case to avoid stray uncaptured warnings in stricter CI configs:
# in test_doesnt_merge_unordered_gappy_df with pytest.warns(UserWarning, match=r"There is a gap in the patch"): _, out = cm.chunk(df)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
dascore/core/spool.py(1 hunks)dascore/utils/chunk.py(2 hunks)tests/test_utils/test_chunk.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/test_utils/test_chunk.py (2)
dascore/core/spool.py (2)
chunk(111-158)chunk(511-541)dascore/utils/chunk.py (1)
chunk(428-470)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
- GitHub Check: test_code (macos-latest, 3.12)
- GitHub Check: test_code (macos-latest, 3.10)
- GitHub Check: test_code (windows-latest, 3.11)
- GitHub Check: test_code (ubuntu-latest, 3.10)
- GitHub Check: test_code (ubuntu-latest, 3.11)
- GitHub Check: test_code (windows-latest, 3.12)
- GitHub Check: test_code_min_deps (windows-latest, 3.13)
- GitHub Check: test_code (windows-latest, 3.10)
- GitHub Check: test_code (macos-latest, 3.13)
- GitHub Check: test_code (ubuntu-latest, 3.13)
- GitHub Check: test_code (windows-latest, 3.13)
- GitHub Check: test_code (macos-latest, 3.11)
- GitHub Check: test_code (ubuntu-latest, 3.12)
- GitHub Check: test_code_min_deps (macos-latest, 3.13)
- GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)
- GitHub Check: test_code_min_deps (windows-latest, 3.12)
- GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
🔇 Additional comments (1)
dascore/utils/chunk.py (1)
5-5: LGTM: warnings import added where needed.Import is scoped appropriately in this module.
dascore/utils/chunk.py
Outdated
| if bool(np.asarray(has_gap).any()): | ||
| msg = ( | ||
| "There is a gap in the patch. As a result, some patches in the " | ||
| "chunked spool may be unevenly sampled. However, they are " | ||
| "still considered contiguous as a tolerance of {self._tolerance} " | ||
| "was used in the chunk function." | ||
| ) | ||
| warnings.warn(msg, UserWarning, stacklevel=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix unformatted placeholder in warning and tighten check; consider better stacklevel.
- {self._tolerance} is not interpolated; the message will literally show the braces.
- has_gap.any() is simpler and avoids unnecessary numpy conversion.
- stacklevel=2 points at an internal helper; bumping helps users see the call site (e.g., ChunkManager.chunk).
Apply this diff:
- if bool(np.asarray(has_gap).any()):
- msg = (
- "There is a gap in the patch. As a result, some patches in the "
- "chunked spool may be unevenly sampled. However, they are "
- "still considered contiguous as a tolerance of {self._tolerance} "
- "was used in the chunk function."
- )
- warnings.warn(msg, UserWarning, stacklevel=2)
+ if has_gap.any():
+ msg = (
+ f"There is a gap in the patch. As a result, some patches in the "
+ f"chunked spool may be unevenly sampled. However, they are still "
+ f"considered contiguous because a tolerance of {self._tolerance} "
+ f"times the {self._name}_step was used in the chunk function."
+ )
+ warnings.warn(msg, UserWarning, stacklevel=3)Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In dascore/utils/chunk.py around lines 204 to 211, the warning message uses an
unformatted placeholder "{self._tolerance}" (so it prints literally), converts
has_gap to a numpy array unnecessarily, and uses stacklevel=2 which points to an
internal helper instead of the calling site; fix by changing the condition to
use has_gap.any(), format the message to interpolate self._tolerance (e.g.,
f-string or .format) so the actual tolerance value appears, and increase the
warnings.warn stacklevel (e.g., to 3 or 4) so the warning points at the user
call site such as ChunkManager.chunk.
d-chambers
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
As an aside, code rabbit is starting to be quite helpful isn't it? |
Yep! |
Description
This PR addresses #509 by adding a warning when a gap is detected in the spool during chunking.
Checklist
I have (if applicable):
Summary by CodeRabbit