⚡️ Speed up function _have_compatible_abi by 564%#3
Closed
codeflash-ai[bot] wants to merge 1 commit intooptimization-attemptfrom
Closed
⚡️ Speed up function _have_compatible_abi by 564%#3codeflash-ai[bot] wants to merge 1 commit intooptimization-attemptfrom
_have_compatible_abi by 564%#3codeflash-ai[bot] wants to merge 1 commit intooptimization-attemptfrom
Conversation
The optimization achieves a **563% speedup** through three key changes that reduce computational overhead in architecture compatibility checking: **1. Module-level `frozenset` for allowed architectures** - Moved `allowed_archs` from a per-call `set` construction to a module-level `_ALLOWED_ARCHS` frozenset - Eliminates repeated set creation overhead (76.5% of original runtime per profiler) - Provides O(1) membership testing vs. O(n) generator expression with `any()` **2. Early return pattern in ELF validation functions** - Replaced chained `and` conditions with immediate `if`/`return False` statements in `_is_linux_armhf` and `_is_linux_i686` - Avoids short-circuit evaluation overhead when conditions fail early - More cache-friendly for the common case where ELF files don't match criteria **3. Single-pass architecture scanning** - Changed from multiple scans (membership tests + `any()` generator) to one `for` loop that returns immediately on first match - Eliminates redundant iteration over the `archs` sequence - Most effective for workloads where compatible architectures appear early in the list **Impact on hot path usage**: The `platform_tags()` function calls `_have_compatible_abi` at the start of manylinux tag generation - a critical path for Python package installation. The optimization is particularly beneficial for: - Large architecture lists (test cases show significant gains with 1000+ architectures) - Cases where allowed architectures like "x86_64" appear early in the sequence - Frequent package compatibility checks during installation workflows The optimized version maintains identical behavior while reducing both memory allocations and CPU cycles, making it especially valuable in packaging workflows where this function may be called repeatedly.
|
Too much of an impact on readability, and I expect almost all the speedup comes from one (readable) change; pulling the set construction (which is really expensive) out of the function. |
Owner
|
merged with changes upstream |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📄 564% (5.64x) speedup for
_have_compatible_abiinsrc/packaging/_manylinux.py⏱️ Runtime :
11.3 microseconds→1.71 microsecondss(best of250runs)📝 Explanation and details
The optimization achieves a 563% speedup through three key changes that reduce computational overhead in architecture compatibility checking:
1. Module-level
frozensetfor allowed architecturesallowed_archsfrom a per-callsetconstruction to a module-level_ALLOWED_ARCHSfrozensetany()2. Early return pattern in ELF validation functions
andconditions with immediateif/return Falsestatements in_is_linux_armhfand_is_linux_i6863. Single-pass architecture scanning
any()generator) to oneforloop that returns immediately on first matcharchssequenceImpact on hot path usage: The
platform_tags()function calls_have_compatible_abiat the start of manylinux tag generation - a critical path for Python package installation. The optimization is particularly beneficial for:The optimized version maintains identical behavior while reducing both memory allocations and CPU cycles, making it especially valuable in packaging workflows where this function may be called repeatedly.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
🔎 Concolic Coverage Tests and Runtime
codeflash_concolic_quk6vk0y/tmprk8af1vi/test_concolic_coverage.py::test__have_compatible_abicodeflash_concolic_quk6vk0y/tmprk8af1vi/test_concolic_coverage.py::test__have_compatible_abi_2codeflash_concolic_quk6vk0y/tmprk8af1vi/test_concolic_coverage.py::test__have_compatible_abi_3To edit these changes
git checkout codeflash/optimize-_have_compatible_abi-miebhkz1and push.