Skip to content

Conversation

@ahmadtourei
Copy link
Collaborator

@ahmadtourei ahmadtourei commented Sep 15, 2025

Description

This PR addresses #509 by adding a warning when a gap is detected in the spool during chunking.

Checklist

I have (if applicable):

  • referenced the GitHub issue this PR closes.
  • documented the new feature with docstrings or appropriate doc page.
  • included a test. See testing guidelines.
  • your name has been added to the contributors page (docs/contributors.md).
  • added the "ready_for_review" tag once the PR is ready to be reviewed.

Summary by CodeRabbit

  • New Features
    • Emits a UserWarning when gaps are detected but data is still treated as contiguous due to configured tolerance.
  • Documentation
    • Clarified the tolerance parameter to specify the maximum allowable gap and explicitly define "gap."
  • Tests
    • Updated tests to expect the new warning when processing gappy data while preserving existing behavior checks.

@ahmadtourei ahmadtourei added the spool related to Spool class label Sep 15, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 15, 2025

Walkthrough

Docstring for BaseSpool.chunk clarified. ChunkManager now emits a UserWarning when a gap is detected but within the configured tolerance (message contains a literal "{self._tolerance}"). Tests updated to expect this warning when chunking gappy data. No API or behavioral logic changes.

Changes

Cohort / File(s) Summary of Changes
Core docstring update
dascore/core/spool.py
Clarified docstring for BaseSpool.chunk tolerance parameter to specify “maximum” gap and added “(gap)” clarification.
Chunk gap warning
dascore/utils/chunk.py
Added warnings import and emit UserWarning when any gap is detected but considered contiguous due to tolerance; warning text includes a literal "{self._tolerance}" placeholder. Logic for group numbering remains unchanged.
Tests adjusted for warning
tests/test_utils/test_chunk.py
Updated test to assert a UserWarning is raised when chunking gappy data (matching "There is a gap in the patch"). Duration verification remains unchanged.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch warning_chunk

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 825f962 and 55a1d38.

📒 Files selected for processing (1)
  • dascore/utils/chunk.py (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
  • GitHub Check: test_code (windows-latest, 3.11)
  • GitHub Check: test_code (windows-latest, 3.10)
  • GitHub Check: test_code (ubuntu-latest, 3.12)
  • GitHub Check: test_code (windows-latest, 3.13)
  • GitHub Check: test_code (ubuntu-latest, 3.13)
  • GitHub Check: test_code (macos-latest, 3.12)
  • GitHub Check: test_code (macos-latest, 3.10)
  • GitHub Check: test_code (windows-latest, 3.12)
  • GitHub Check: test_code (ubuntu-latest, 3.10)
  • GitHub Check: test_code (ubuntu-latest, 3.11)
  • GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
  • GitHub Check: test_code_min_deps (macos-latest, 3.12)
  • GitHub Check: test_code (macos-latest, 3.11)
  • GitHub Check: test_code_min_deps (windows-latest, 3.13)
  • GitHub Check: test_code_min_deps (windows-latest, 3.12)
  • GitHub Check: test_code_min_deps (macos-latest, 3.13)
  • GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)
🔇 Additional comments (1)
dascore/utils/chunk.py (1)

5-5: LGTM: warnings import added where needed.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed The PR description follows the repository template by including "## Description" and the Checklist and clearly states the intent to address #509 by adding a warning when a gap is detected during chunking. It is concise and on-topic, giving reviewers the high-level purpose, but omits brief implementation details (e.g., which modules/tests were changed) and leaves checklist items unchecked. Because it links the issue and states the change, the description is mostly complete for review. Adding a short note about the modified files and the included test would improve clarity.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title Check ✅ Passed The title "warning for chunking gappy spools" is concise and accurately summarizes the primary change in the PR — adding a warning when chunking spools/patches that contain gaps — and matches the code and test updates in the summary.

@codecov
Copy link

codecov bot commented Sep 15, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.88%. Comparing base (13b2bf2) to head (55a1d38).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #538   +/-   ##
=======================================
  Coverage   99.88%   99.88%           
=======================================
  Files         122      122           
  Lines       10044    10048    +4     
=======================================
+ Hits        10032    10036    +4     
  Misses         12       12           
Flag Coverage Δ
unittests 99.88% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
dascore/core/spool.py (1)

135-136: Clarify tolerance units; it's a multiple of the sampling interval, not a count of samples.

Current phrasing suggests an integer “number of samples.” In practice, tolerance is a scalar multiplier applied to the sampling step (can be non‑integer). Recommend rewording for accuracy.

Apply this diff:

-            The maximum number of samples a block of data can be spaced (gap) and
-            still be considered contiguous.
+            The maximum multiple of the sampling interval that a gap may be and
+            still be considered contiguous (e.g., 1.5 allows gaps up to 1.5 times
+            the step along the chunking dimension).
tests/test_utils/test_chunk.py (1)

261-263: Good: test now asserts a warning on gappy input.

This covers the new behavior.

Also consider asserting the same warning for the unordered gappy case to avoid stray uncaptured warnings in stricter CI configs:

# in test_doesnt_merge_unordered_gappy_df
with pytest.warns(UserWarning, match=r"There is a gap in the patch"):
    _, out = cm.chunk(df)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 13b2bf2 and 825f962.

📒 Files selected for processing (3)
  • dascore/core/spool.py (1 hunks)
  • dascore/utils/chunk.py (2 hunks)
  • tests/test_utils/test_chunk.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/test_utils/test_chunk.py (2)
dascore/core/spool.py (2)
  • chunk (111-158)
  • chunk (511-541)
dascore/utils/chunk.py (1)
  • chunk (428-470)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
  • GitHub Check: test_code (macos-latest, 3.12)
  • GitHub Check: test_code (macos-latest, 3.10)
  • GitHub Check: test_code (windows-latest, 3.11)
  • GitHub Check: test_code (ubuntu-latest, 3.10)
  • GitHub Check: test_code (ubuntu-latest, 3.11)
  • GitHub Check: test_code (windows-latest, 3.12)
  • GitHub Check: test_code_min_deps (windows-latest, 3.13)
  • GitHub Check: test_code (windows-latest, 3.10)
  • GitHub Check: test_code (macos-latest, 3.13)
  • GitHub Check: test_code (ubuntu-latest, 3.13)
  • GitHub Check: test_code (windows-latest, 3.13)
  • GitHub Check: test_code (macos-latest, 3.11)
  • GitHub Check: test_code (ubuntu-latest, 3.12)
  • GitHub Check: test_code_min_deps (macos-latest, 3.13)
  • GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)
  • GitHub Check: test_code_min_deps (windows-latest, 3.12)
  • GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
🔇 Additional comments (1)
dascore/utils/chunk.py (1)

5-5: LGTM: warnings import added where needed.

Import is scoped appropriately in this module.

Comment on lines 204 to 211
if bool(np.asarray(has_gap).any()):
msg = (
"There is a gap in the patch. As a result, some patches in the "
"chunked spool may be unevenly sampled. However, they are "
"still considered contiguous as a tolerance of {self._tolerance} "
"was used in the chunk function."
)
warnings.warn(msg, UserWarning, stacklevel=2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix unformatted placeholder in warning and tighten check; consider better stacklevel.

  • {self._tolerance} is not interpolated; the message will literally show the braces.
  • has_gap.any() is simpler and avoids unnecessary numpy conversion.
  • stacklevel=2 points at an internal helper; bumping helps users see the call site (e.g., ChunkManager.chunk).

Apply this diff:

-        if bool(np.asarray(has_gap).any()):
-            msg = (
-                "There is a gap in the patch. As a result, some patches in the "
-                "chunked spool may be unevenly sampled. However, they are "
-                "still considered contiguous as a tolerance of {self._tolerance} "
-                "was used in the chunk function."
-            )
-            warnings.warn(msg, UserWarning, stacklevel=2)
+        if has_gap.any():
+            msg = (
+                f"There is a gap in the patch. As a result, some patches in the "
+                f"chunked spool may be unevenly sampled. However, they are still "
+                f"considered contiguous because a tolerance of {self._tolerance} "
+                f"times the {self._name}_step was used in the chunk function."
+            )
+            warnings.warn(msg, UserWarning, stacklevel=3)

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In dascore/utils/chunk.py around lines 204 to 211, the warning message uses an
unformatted placeholder "{self._tolerance}" (so it prints literally), converts
has_gap to a numpy array unnecessarily, and uses stacklevel=2 which points to an
internal helper instead of the calling site; fix by changing the condition to
use has_gap.any(), format the message to interpolate self._tolerance (e.g.,
f-string or .format) so the actual tolerance value appears, and increase the
warnings.warn stacklevel (e.g., to 3 or 4) so the warning points at the user
call site such as ChunkManager.chunk.

@ahmadtourei ahmadtourei changed the title warning for gappy patches warning for chunking gappy spools Sep 15, 2025
Copy link
Contributor

@d-chambers d-chambers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@d-chambers
Copy link
Contributor

As an aside, code rabbit is starting to be quite helpful isn't it?

@ahmadtourei ahmadtourei merged commit c2c17a1 into master Sep 16, 2025
22 checks passed
@ahmadtourei ahmadtourei deleted the warning_chunk branch September 16, 2025 13:46
@ahmadtourei
Copy link
Collaborator Author

As an aside, code rabbit is starting to be quite helpful isn't it?

Yep!

@d-chambers d-chambers mentioned this pull request Sep 24, 2025
5 tasks
@coderabbitai coderabbitai bot mentioned this pull request Nov 20, 2025
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

spool related to Spool class

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants