Skip to content

perf(pack): Improve Intra-Structure Parallel Efficiency#14

Merged
TKanX merged 9 commits intomainfrom
feature/13-improve-intra-structure-parallel-efficiency
Apr 5, 2026
Merged

perf(pack): Improve Intra-Structure Parallel Efficiency#14
TKanX merged 9 commits intomainfrom
feature/13-improve-intra-structure-parallel-efficiency

Conversation

@TKanX
Copy link
Copy Markdown
Member

@TKanX TKanX commented Apr 4, 2026

Summary:

Eliminated redundant work and parallel dispatch overhead across all compute-bound packing phases. DEE: 1.59 s → 0.81 s (−49%) via worklist convergence and prebuilt per-slot neighbor caches. DP adjacency migrated to BitMatrix (64-bit packed), reducing memory footprint and simplifying graph-algorithm signatures throughout. Pair energy writes directly to table slices, removing intermediate allocation and a nested parallel loop. DB379 quality unchanged (χ₁–₄ (20°) 71.5%, RMSD 0.728 Å); all 401 tests pass.

Changes:

  • DEE (dee.rs): Worklist convergence — each round only rechecks slots neighboring a newly pruned slot, skipping unchanged slots entirely. Alive candidates cached per slot (sorted by ascending self-energy for earlier witness hits) and kept in sync with pruning; graph is traversed once before the loop rather than per round. Adaptive parallelism via with_min_len.
  • DP (dp.rs): Vec<bool> adjacency replaced with BitMatrix (u64-packed rows) — denser, cache-friendly, and cleaner API; all graph-algorithm functions (mcs_order, is_peo, fill_in, etc.) simplified accordingly. Standalone build_alive_table/topo_order helpers inlined into the solve path. Separator DP uses adaptive with_min_len parallelism.
  • Energy table (energy.rs): PairEnergyTable::set() replaced by matrices_mut(), which returns non-overlapping mutable slices via split_at_mut for zero-copy parallel bulk writes.
  • Pair energy (pair.rs): compute() writes directly into table slices from matrices_mut(), eliminating an intermediate Vec. Inner rotamer loop made sequential within the per-edge parallel dispatch.
  • Prune (prune.rs): Removed redundant nested parallel iterator from frame energy computation.

@TKanX TKanX self-assigned this Apr 4, 2026
@TKanX TKanX added enhancement ✨ New feature or request performance ⚡ Performance improvements and code optimizations labels Apr 4, 2026
@TKanX TKanX linked an issue Apr 4, 2026 that may be closed by this pull request
6 tasks
Copilot AI review requested due to automatic review settings April 4, 2026 23:37
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Performance-focused refactor of packing phases to reduce redundant work and parallel overhead, including denser graph representations and zero-copy pair-energy writes into the global tables.

Changes:

  • Reworked DEE to converge via a neighbor-driven worklist with cached per-slot alive candidates and precomputed neighbor-edge metadata.
  • Switched DP adjacency to a packed BitMatrix and refactored tree-decomposition DP to use cached per-node/per-edge info with adaptive parallelism.
  • Updated pair-energy computation and PairEnergyTable APIs to write directly into non-overlapping mutable matrix slices, removing intermediate allocations and nested parallelism.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
crates/dreid-pack/src/pack/phase/prune.rs Removes redundant nested parallel iterator in frame-energy computation.
crates/dreid-pack/src/pack/phase/pair.rs Writes pair energies directly into PairEnergyTable edge slices in a per-edge parallel loop.
crates/dreid-pack/src/pack/phase/dp.rs Introduces BitMatrix adjacency + refactors elimination/DP execution and caching.
crates/dreid-pack/src/pack/phase/dee.rs Implements worklist-based DEE convergence with cached alive sets and prebuilt edge metadata.
crates/dreid-pack/src/pack/model/spatial.rs Comment spelling correction (“initialized”).
crates/dreid-pack/src/pack/model/energy.rs Replaces per-entry set() with matrices_mut() for bulk, zero-copy mutable access; updates tests accordingly.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread crates/dreid-pack/src/pack/phase/dee.rs
Comment thread crates/dreid-pack/src/pack/phase/dee.rs
@TKanX TKanX merged commit cebbf25 into main Apr 5, 2026
8 checks passed
@TKanX TKanX deleted the feature/13-improve-intra-structure-parallel-efficiency branch April 5, 2026 04:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement ✨ New feature or request performance ⚡ Performance improvements and code optimizations

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve Intra-Structure Parallel Efficiency

2 participants