-
Notifications
You must be signed in to change notification settings - Fork 28
handle incorrectly sized sintela binary + keyboard interrupt in scan #520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds optional rich Progress support to progress utilities and scan; scan now handles KeyboardInterrupt, unknown formats, corrupt files, and missing optional dependencies with fiber_io_hint caching; Sintela binary utils now raise InvalidFiberFileError for mis-sized files; tests added for interrupt and corrupted Sintela files. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant U as User
participant SC as dc.scan
participant PR as Progress (get_progress_instance / track)
participant RM as IOResourceManager
participant FI as fiber_io
participant FS as Filesystem
U->>SC: scan(patch_sources, progress)
SC->>PR: get_progress_instance(progress)
PR-->>SC: Progress instance
loop each patch_source
SC->>RM: enter IOResourceManager(patch_source)
RM-->>SC: (fiber_io, resource) or UnknownFiberFormatError
alt Unknown format
SC-->>SC: log/skip source
else Got fiber_io
SC->>SC: cache fiber_io by input_type
alt resource is directory
SC->>FI: fiber_io.scan(directory)
FI-->>SC: attrs or MissingOptionalDependencyError
else resource is file
SC->>FI: fiber_io.scan(file)
alt Error (OSError / InvalidFiberFileError / ValueError / TypeError)
FI-->>SC: error -> warn and continue
else MissingOptionalDependencyError
FI-->>SC: record missing dep and continue
else Success
FI-->>SC: attrs
end
end
SC-->>SC: append attrs to output
end
end
opt KeyboardInterrupt from Progress
PR-->>SC: raises KeyboardInterrupt
SC->>PR: stop() if supported
SC-->>U: propagate KeyboardInterrupt
end
SC-->>U: return list[PatchAttrs]
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. 📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #520 +/- ##
=======================================
Coverage 99.87% 99.87%
=======================================
Files 121 121
Lines 9874 9887 +13
=======================================
+ Hits 9862 9875 +13
Misses 12 12
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (12)
tests/test_io/test_io_core.py (1)
14-17: Minor: Redundant imports (dascoreanddc)Both
dascoreanddascore as dcare imported; only one is necessary. You can usedcfor both attribute patching and functional calls.Apply this diff for brevity:
-import rich.progress as prog - -import dascore -import dascore as dc +import rich.progress as prog +import dascore as dcAnd adjust the monkeypatch line below:
- monkeypatch.setattr(dascore, "_debug", False) + monkeypatch.setattr(dc, "_debug", False)tests/test_io/test_sintela_binary/test_sintela_binary.py (1)
18-35: Solid negative test; consider asserting on message and adding a positive controlThis proves mis-sized files raise
InvalidFiberFileError. Two optional enhancements:
- Assert the error message contains a hint (e.g., “not divisible by packet size”) to lock in UX.
- Add a positive control test using the unmodified
.rawfile to assert scan succeeds, guarding against false positives in fixture setup.Example tweak inside the test:
- with pytest.raises(InvalidFiberFileError): + with pytest.raises(InvalidFiberFileError, match="divisible|packet size"): fiber_io.scan(extra_bytes_file)dascore/utils/progress.py (2)
57-61: Docstring nit: clarify that a Progress instance or subclass is acceptedMinor clarity: explicitly say “instance or subclass” to reflect the behavior after the above change.
Suggested wording:
- “…can also accept a rich.progress.Progress instance or subclass.”
Also, “emmit” -> “emit” below (typo).
To update the typo outside the changed hunk, adjust this line:
# The minimum length to emit a progress bar.
36-37: Optional: consider constructing columns via compat to keep all rich touchpoints centralizedYou import
Progressfrom compat but still construct columns viarich.progress as prog. This is fine, but if you’re aiming to isolate ‘rich’ behind compat, consider mirroring columns or providing a small factory there. Not required for this PR.dascore/io/core.py (8)
23-23: Decouple runtime from rich by avoiding a hard Progress importWith from future import annotations, you can reference Progress as a string literal and avoid importing it at runtime. This keeps rich fully optional and prevents accidental ImportError during import if rich isn’t installed. Also aligns with type-only usage here.
Apply:
-from dascore.compat import Progress +from typing import TYPE_CHECKING + +if TYPE_CHECKING: + from dascore.compat import ProgressAnd update the annotation to a forward ref:
-def scan(..., progress: PROGRESS_LEVELS | Progress = "standard", ...) -> list[dc.PatchAttrs]: +def scan(..., progress: PROGRESS_LEVELS | "Progress" = "standard", ...) -> list[dc.PatchAttrs]:Also applies to: 790-790
363-365: Catching Exception in _get_format risks masking bugs; add debug loggingSwallowing every exception improves resilience to third-party FiberIOs, but completely silences programmer errors. Suggest logging at debug level to aid diagnosis without spamming users.
Minimal change:
- except Exception: - continue + except Exception as ex: + # Keep robust, but surface details in debug logs. + import logging as _logging + _logging.getLogger(__name__).debug( + "get_format suppressed for %s: %r", + getattr(fiber_io, "name", "?"), + ex, + ) + continue
811-813: Docstring typo: “acceted” → “accepts”; clarify phrasingSmall doc polish for correctness.
- "basic" is best for low latency scenarios. Can also acceted a subclass - of rich.progress.Progress. + "basic" is best for low-latency scenarios. Can also accept a subclass + of rich.progress.Progress.
882-887: Include exception detail in warning for failed scansCurrently the warning hides the cause, making triage harder. Appending the exception message keeps logs actionable without being noisy.
- except (OSError, InvalidFiberFileError, ValueError, TypeError): - warnings.warn(f"Failed to scan {resource}", UserWarning) + except (OSError, InvalidFiberFileError, ValueError, TypeError) as ex: + warnings.warn(f"Failed to scan {resource}: {ex}", UserWarning) continue
888-890: Deriving package name from ex.msg is brittleImportError/MissingOptionalDependencyError usually carries .name; .msg parsing can misidentify the package. Prefer name, then fall back to message parsing.
- except MissingOptionalDependencyError as ex: - missing_optional_deps[ex.msg.split(" ")[0]] += 1 + except MissingOptionalDependencyError as ex: + pkg = getattr(ex, "name", None) + if not pkg: + # Fallback: first token from message or str(ex) + txt = getattr(ex, "msg", "") or str(ex) + pkg = (txt.split()[0] if txt else "unknown") + missing_optional_deps[pkg] += 1 continue
891-892: Avoid re-parsing PatchAttrs objectsIf source already yields PatchAttrs, re-creating via from_dict is unnecessary.
- for attr in source: - out.append(dc.PatchAttrs.from_dict(attr)) + for attr in source: + out.append(attr if isinstance(attr, dc.PatchAttrs) + else dc.PatchAttrs.from_dict(attr))
893-895: Comment typoMinor typo and casing.
- # Ensure ctl + c exists scan. + # Ensure Ctrl+C exits scan.
654-662: Align scan_to_df progress type with scan’s new annotationscan_to_df passes progress through to scan but its type hint still excludes a Progress instance. Align for consistency.
-def scan_to_df(..., progress: PROGRESS_LEVELS = "standard", ...) -> pd.DataFrame: +def scan_to_df(..., progress: PROGRESS_LEVELS | "Progress" = "standard", ...) -> pd.DataFrame:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (6)
dascore/compat.py(1 hunks)dascore/io/core.py(6 hunks)dascore/io/sintela_binary/utils.py(3 hunks)dascore/utils/progress.py(4 hunks)tests/test_io/test_io_core.py(2 hunks)tests/test_io/test_sintela_binary/test_sintela_binary.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (6)
tests/test_io/test_sintela_binary/test_sintela_binary.py (3)
dascore/exceptions.py (1)
InvalidFiberFileError(14-15)dascore/io/sintela_binary/core.py (1)
SintelaBinaryV3(30-80)dascore/utils/downloader.py (1)
fetch(40-58)
dascore/compat.py (1)
tests/test_io/test_io_core.py (1)
Progress(382-387)
tests/test_io/test_io_core.py (2)
dascore/utils/progress.py (1)
track(39-84)dascore/io/core.py (2)
scan(483-500)scan(784-898)
dascore/io/sintela_binary/utils.py (2)
dascore/exceptions.py (1)
InvalidFiberFileError(14-15)dascore/utils/misc.py (1)
get_buffer_size(751-768)
dascore/utils/progress.py (2)
tests/test_io/test_io_core.py (2)
Progress(382-387)track(385-387)dascore/io/indexer.py (2)
update(90-95)update(309-346)
dascore/io/core.py (3)
tests/test_io/test_io_core.py (1)
Progress(382-387)dascore/exceptions.py (3)
InvalidFiberFileError(14-15)UnknownFiberFormatError(18-19)MissingOptionalDependencyError(98-99)dascore/utils/io.py (2)
IOResourceManager(89-137)source(98-105)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: test_code_min_deps (macos-latest, 3.12)
- GitHub Check: test_code (windows-latest, 3.11)
- GitHub Check: test_code_min_deps (windows-latest, 3.13)
- GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)
- GitHub Check: test_code (windows-latest, 3.12)
- GitHub Check: test_code_min_deps (macos-latest, 3.13)
- GitHub Check: test_code_min_deps (windows-latest, 3.12)
- GitHub Check: test_code (macos-latest, 3.11)
- GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
- GitHub Check: test_code (windows-latest, 3.10)
- GitHub Check: test_code (macos-latest, 3.10)
🔇 Additional comments (5)
dascore/compat.py (1)
15-15:richis already a declared dependency—no fallback needed
The search shows"rich",at line 53 in your project’s dependency file, confirming thatrichis an explicit runtime requirement. The guarded import shim is therefore unnecessary.tests/test_io/test_io_core.py (1)
379-395: Test coverage looks good; it exercises the exact regression around Ctrl+C and Progress integrationThe test cleanly validates propagation of
KeyboardInterruptthrough the progress path and ensures the bar is active by flipping_debug. Nicely scoped.dascore/io/core.py (3)
36-36: Good: add InvalidFiberFileError importPulling in InvalidFiberFileError enables robust handling of mis-sized/corrupt files during scan. Matches the PR intent.
848-896: Graceful Ctrl+C handling looks goodWrapping the scan loop and stopping progress ensures a clean exit and matches the new test behavior. Re-raising preserves expected KeyboardInterrupt semantics.
871-873: Skip Protocol Verified – No Changes NeededI reviewed the
_iter_filesystemand_iterate_scan_inputsimplementations:
_iter_filesystemexplicitly handles a"skip"signal right after yielding a directory, yieldingNonethen returning to stop recursion._iterate_scan_inputsusesyield from _iter_filesystem(...), which per PEP 380 correctly forwards.send("skip")calls into the subgenerator.- The code in
dascore/io/core.pysends"skip"on the outer generator (not thetrackwrapper), ensuring the signal reaches_iter_filesystem.- There’s a dedicated test (
test_skip_signal_directory) confirming that anything after a"skip"-sent directory is indeed skipped.Since the end-to-end protocol is already implemented and covered by tests, no action is required here.
| def _get_number_of_packets(fid, header, size): | ||
| """Get the number of packets in the file.""" | ||
| # get filesize without depending on input being a file. | ||
| file_size = get_buffer_size(fid) | ||
| samples = header["num_samples"] | ||
| channels = header["num_channels"] | ||
| assert version == "3", "Only version 3 of Sintela binary currently supported." | ||
| packet_size = size + (channels * samples * 4) | ||
| num_packets, remainder = divmod(file_size, packet_size) | ||
| assert not remainder, "Sintela binary file size not divisible by packet size." | ||
| if remainder: | ||
| msg = "Sintela binary file size not divisible by packet size." | ||
| raise InvalidFiberFileError(msg) | ||
| return num_packets | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Avoid magic constant for sample size; derive from dtype and rename param for clarity
packet_size = size + (channels * samples * 4) bakes in 4 bytes/sample. This couples the packet math to V3’s float32 and will silently break if dtype ever changes. Also, the size parameter is actually the header size — name it accordingly.
Apply this diff to use the header’s dtype and clarify naming:
-def _get_number_of_packets(fid, header, size):
+def _get_number_of_packets(fid, header, header_size):
"""Get the number of packets in the file."""
# get filesize without depending on input being a file.
file_size = get_buffer_size(fid)
samples = header["num_samples"]
channels = header["num_channels"]
- packet_size = size + (channels * samples * 4)
+ # Derive bytes/sample from header["dtype"] to avoid magic constants.
+ dtype_size = np.dtype(header["dtype"]).itemsize
+ packet_size = header_size + (channels * samples * dtype_size)
num_packets, remainder = divmod(file_size, packet_size)
if remainder:
msg = "Sintela binary file size not divisible by packet size."
raise InvalidFiberFileError(msg)
return num_packetsNote: this assumes header["dtype"] is populated before calling this function; see next comment for ordering fix.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _get_number_of_packets(fid, header, size): | |
| """Get the number of packets in the file.""" | |
| # get filesize without depending on input being a file. | |
| file_size = get_buffer_size(fid) | |
| samples = header["num_samples"] | |
| channels = header["num_channels"] | |
| assert version == "3", "Only version 3 of Sintela binary currently supported." | |
| packet_size = size + (channels * samples * 4) | |
| num_packets, remainder = divmod(file_size, packet_size) | |
| assert not remainder, "Sintela binary file size not divisible by packet size." | |
| if remainder: | |
| msg = "Sintela binary file size not divisible by packet size." | |
| raise InvalidFiberFileError(msg) | |
| return num_packets | |
| def _get_number_of_packets(fid, header, header_size): | |
| """Get the number of packets in the file.""" | |
| # get filesize without depending on input being a file. | |
| file_size = get_buffer_size(fid) | |
| samples = header["num_samples"] | |
| channels = header["num_channels"] | |
| # Derive bytes/sample from header["dtype"] to avoid magic constants. | |
| dtype_size = np.dtype(header["dtype"]).itemsize | |
| packet_size = header_size + (channels * samples * dtype_size) | |
| num_packets, remainder = divmod(file_size, packet_size) | |
| if remainder: | |
| msg = "Sintela binary file size not divisible by packet size." | |
| raise InvalidFiberFileError(msg) | |
| return num_packets |
🤖 Prompt for AI Agents
In dascore/io/sintela_binary/utils.py around lines 109 to 121, the function uses
a hardcoded 4 bytes/sample and a misleading parameter name `size`; replace the
magic constant by deriving the per-sample byte size from header["dtype"] (e.g.
numpy.dtype(header["dtype"]).itemsize) and rename the `size` parameter to
`header_size` (and update call sites) so packet_size is computed as header_size
+ (channels * samples * itemsize); also ensure callers populate header["dtype"]
before this function is invoked.
| assert version == "3", "only 3 support for now," | ||
| header["num_packets"] = _get_number_of_packets(fid, header, header_size) | ||
| header["dtype"] = "<f4" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Replace asserts with explicit exceptions and set dtype before computing packet count
assert version == "3"is a runtime assertion that can be stripped with-Oand yields a genericAssertionError. Prefer explicit, stable exceptions with actionable messages.- Call
_get_number_of_packetsafter settingheader["dtype"]so packet sizing uses the deriveditemsize.
Apply this diff:
- assert version == "3", "only 3 support for now,"
- header["num_packets"] = _get_number_of_packets(fid, header, header_size)
- header["dtype"] = "<f4"
+ if version != "3":
+ raise InvalidFiberFileError(
+ f"Unsupported Sintela binary version: {version}. Only '3' is supported."
+ )
+ header["dtype"] = "<f4"
+ header["num_packets"] = _get_number_of_packets(fid, header, header_size)Additionally, consider replacing the earlier size assertion with a proper exception for consistency and better user feedback:
# Replace:
assert header_size == expected_size
# With:
if header_size != expected_size:
raise InvalidFiberFileError(
f"Header size {header_size} does not match expected {expected_size} for version {version}."
)Please run a quick search to confirm _get_number_of_packets isn’t called elsewhere with the old signature:
🏁 Script executed:
#!/bin/bash
rg -nP --glob '!**/site-packages/**' '\b_get_number_of_packets\s*\(' -C2Length of output: 862
🏁 Script executed:
#!/bin/bash
# Locate and display the start of the Sintela binary utils file for inspection
file="dascore/io/sintela_binary/utils.py"
if [ -f "$file" ]; then
echo "Found $file; printing lines 1–200:"
sed -n '1,200p' "$file"
else
echo "File not found at expected path. Searching for any utils.py in sintela_binary:"
find . -type f | grep 'sintela_binary/utils.py'
fiLength of output: 6706
Enforce explicit version checking and reorder dtype assignment
The only call to _get_number_of_packets is in _read_remaining_header, so updating this block won’t break elsewhere. Replace both assert statements with InvalidFiberFileError and set header["dtype"] before computing packet count to ensure correct itemsize usage.
• File: dascore/io/sintela_binary/utils.py
• Function: _read_remaining_header
• Lines: ~125–135
- assert header_size == expected_size
+ if header_size != expected_size:
+ raise InvalidFiberFileError(
+ f"Header size {header_size} does not match expected {expected_size} for version {version}."
+ )
- assert version == "3", "only 3 support for now,"
- header["num_packets"] = _get_number_of_packets(fid, header, header_size)
- header["dtype"] = "<f4"
+ if version != "3":
+ raise InvalidFiberFileError(
+ f"Unsupported Sintela binary version: {version}. Only '3' is supported."
+ )
+ header["dtype"] = "<f4"
+ header["num_packets"] = _get_number_of_packets(fid, header, header_size)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| assert version == "3", "only 3 support for now," | |
| header["num_packets"] = _get_number_of_packets(fid, header, header_size) | |
| header["dtype"] = "<f4" | |
| # Previously: | |
| # expected_size = ... | |
| # assert header_size == expected_size | |
| # assert version == "3", "only 3 support for now," | |
| # header["num_packets"] = _get_number_of_packets(fid, header, header_size) | |
| # header["dtype"] = "<f4" | |
| if header_size != expected_size: | |
| raise InvalidFiberFileError( | |
| f"Header size {header_size} does not match expected {expected_size} for version {version}." | |
| ) | |
| if version != "3": | |
| raise InvalidFiberFileError( | |
| f"Unsupported Sintela binary version: {version}. Only '3' is supported." | |
| ) | |
| header["dtype"] = "<f4" | |
| header["num_packets"] = _get_number_of_packets(fid, header, header_size) |
🤖 Prompt for AI Agents
In dascore/io/sintela_binary/utils.py around lines 132 to 134, replace the two
assert statements with explicit InvalidFiberFileError raises (check version
explicitly and raise InvalidFiberFileError with a clear message if unsupported)
and assign header["dtype"] = "<f4" before calling _get_number_of_packets so
packet-size calculations use the correct itemsize; ensure the error is raised
(not asserted) and the dtype is set prior to computing header["num_packets"].
| def get_progress_instance(progress: PROGRESS_LEVELS | Progress = "standard"): | ||
| """ | ||
| Get the Rich progress bar instance based on complexity level. | ||
| """ | ||
| # If a progress class is passed in, just use it. | ||
| if isinstance(progress, Progress): | ||
| return progress |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Support both Progress instances and subclasses (classes) as input
Current logic only detects an instance of Progress. If a user passes a subclass (class object) intending you to construct it, we’ll ignore it and build the base Progress instead.
Apply this diff to handle both:
def get_progress_instance(progress: PROGRESS_LEVELS | Progress = "standard"):
@@
- # If a progress class is passed in, just use it.
- if isinstance(progress, Progress):
- return progress
+ # Accept either an instance or a subclass (class) of Progress.
+ if isinstance(progress, Progress):
+ return progress
+ if isinstance(progress, type) and issubclass(progress, Progress):
+ ProgressClass = progress
+ else:
+ ProgressClass = Progress
@@
- if progress == "basic":
+ if progress == "basic":
# set the refresh rate very low and eliminate the spinner
kwargs["refresh_per_second"] = 0.25
progress_list = progress_list[1:]
- return Progress(*progress_list, **kwargs)
+ return ProgressClass(*progress_list, **kwargs)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def get_progress_instance(progress: PROGRESS_LEVELS | Progress = "standard"): | |
| """ | |
| Get the Rich progress bar instance based on complexity level. | |
| """ | |
| # If a progress class is passed in, just use it. | |
| if isinstance(progress, Progress): | |
| return progress | |
| def get_progress_instance(progress: PROGRESS_LEVELS | Progress = "standard"): | |
| """ | |
| Get the Rich progress bar instance based on complexity level. | |
| """ | |
| # Accept either an instance or a subclass (class) of Progress. | |
| if isinstance(progress, Progress): | |
| return progress | |
| if isinstance(progress, type) and issubclass(progress, Progress): | |
| ProgressClass = progress | |
| else: | |
| ProgressClass = Progress | |
| if progress == "basic": | |
| # set the refresh rate very low and eliminate the spinner | |
| kwargs["refresh_per_second"] = 0.25 | |
| progress_list = progress_list[1:] | |
| return ProgressClass(*progress_list, **kwargs) |
🤖 Prompt for AI Agents
In dascore/utils/progress.py around lines 15 to 21, the current check only
handles Progress instances and ignores when a Progress subclass (class object)
is passed; update the logic to also detect when progress is a class (use
inspect.isclass or isinstance(progress, type)) and issubclass(progress,
Progress), and in that case instantiate and return progress() (or
progress(*default_args) if needed); keep the existing instance check and string
handling as-is.
| yield from sequence | ||
| return | ||
| update = 1.0 if progress == "standard" else 5.0 | ||
| update = 1.0 if isinstance(progress, str) and progress == "standard" else 5.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Align update rate semantics: only “basic” should slow refresh; instances/classes should get standard rate
Currently, any non-string progress (including instances) use the slower 5s update period. That’s surprising. Make “basic” the only slow case; everything else gets the standard 1s.
Apply this diff:
- update = 1.0 if isinstance(progress, str) and progress == "standard" else 5.0
+ update = 5.0 if (isinstance(progress, str) and progress == "basic") else 1.0📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| update = 1.0 if isinstance(progress, str) and progress == "standard" else 5.0 | |
| update = 5.0 if (isinstance(progress, str) and progress == "basic") else 1.0 |
🤖 Prompt for AI Agents
In dascore/utils/progress.py around line 76, the update rate logic currently
treats any non-string progress (including instances and classes) as the slow
5.0s case; change it so only the string "basic" uses 5.0s and everything else
(including progress instances or classes and other strings) uses the standard
1.0s. Replace the conditional to check specifically for isinstance(progress,
str) and progress == "basic" to set update = 5.0, otherwise set update = 1.0.
Description
In one of our Sintela DAS datasets there are a small number of files which are not an integer size of packets. These files would cause the indexing of the entire directory to crash. This PR fixes that issues by checking and raising the correct error when such a situation is found, then expanding the types of exceptions caught when scanning.
It also fixes a minor annoying "feature" that ctl + c didn't work correctly to kill the progress bar while indexing.
Checklist
I have (if applicable):
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests