perf(pack): Improve Intra-Structure Parallel Efficiency by TKanX · Pull Request #14 · caltechmsc/dreid-pack

TKanX · 2026-04-04T23:37:45Z

Summary:

Eliminated redundant work and parallel dispatch overhead across all compute-bound packing phases. DEE: 1.59 s → 0.81 s (−49%) via worklist convergence and prebuilt per-slot neighbor caches. DP adjacency migrated to BitMatrix (64-bit packed), reducing memory footprint and simplifying graph-algorithm signatures throughout. Pair energy writes directly to table slices, removing intermediate allocation and a nested parallel loop. DB379 quality unchanged (χ₁–₄ (20°) 71.5%, RMSD 0.728 Å); all 401 tests pass.

Changes:

DEE (dee.rs): Worklist convergence — each round only rechecks slots neighboring a newly pruned slot, skipping unchanged slots entirely. Alive candidates cached per slot (sorted by ascending self-energy for earlier witness hits) and kept in sync with pruning; graph is traversed once before the loop rather than per round. Adaptive parallelism via with_min_len.
DP (dp.rs): Vec<bool> adjacency replaced with BitMatrix (u64-packed rows) — denser, cache-friendly, and cleaner API; all graph-algorithm functions (mcs_order, is_peo, fill_in, etc.) simplified accordingly. Standalone build_alive_table/topo_order helpers inlined into the solve path. Separator DP uses adaptive with_min_len parallelism.
Energy table (energy.rs): PairEnergyTable::set() replaced by matrices_mut(), which returns non-overlapping mutable slices via split_at_mut for zero-copy parallel bulk writes.
Pair energy (pair.rs): compute() writes directly into table slices from matrices_mut(), eliminating an intermediate Vec. Inner rotamer loop made sequential within the per-edge parallel dispatch.
Prune (prune.rs): Removed redundant nested parallel iterator from frame energy computation.

…es_mut method

…able slices

…y removing unnecessary sub parallel iterator

… caches

Copilot

Pull request overview

Performance-focused refactor of packing phases to reduce redundant work and parallel overhead, including denser graph representations and zero-copy pair-energy writes into the global tables.

Changes:

Reworked DEE to converge via a neighbor-driven worklist with cached per-slot alive candidates and precomputed neighbor-edge metadata.
Switched DP adjacency to a packed BitMatrix and refactored tree-decomposition DP to use cached per-node/per-edge info with adaptive parallelism.
Updated pair-energy computation and PairEnergyTable APIs to write directly into non-overlapping mutable matrix slices, removing intermediate allocations and nested parallelism.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
crates/dreid-pack/src/pack/phase/prune.rs	Removes redundant nested parallel iterator in frame-energy computation.
crates/dreid-pack/src/pack/phase/pair.rs	Writes pair energies directly into `PairEnergyTable` edge slices in a per-edge parallel loop.
crates/dreid-pack/src/pack/phase/dp.rs	Introduces `BitMatrix` adjacency + refactors elimination/DP execution and caching.
crates/dreid-pack/src/pack/phase/dee.rs	Implements worklist-based DEE convergence with cached alive sets and prebuilt edge metadata.
crates/dreid-pack/src/pack/model/spatial.rs	Comment spelling correction (“initialized”).
crates/dreid-pack/src/pack/model/energy.rs	Replaces per-entry `set()` with `matrices_mut()` for bulk, zero-copy mutable access; updates tests accordingly.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

TKanX added 9 commits April 3, 2026 18:52

perf(pack): Improve tree decomposition performance

02872af

refactor(pack): Simplify initialization of fill cost and heap

91b96f0

perf(pack): Enhance DP structure with BitMatrix for improved efficiency

45081d7

refactor(pack): Remove set method from PairEnergyTable and add matric…

2eede26

…es_mut method

perf(pack): Optimize pair energy computation by writing directly to t…

5925790

…able slices

perf(pack): Improve parallel efficiency in frame energy computation b…

0d4b8cb

…y removing unnecessary sub parallel iterator

refactor(pack): Simplify dp parallel iteration with adaptive chunking

6c963d8

perf(pack): Streamline DEE phase processing and use prebuilt neighbor…

e5c401c

… caches

docs(pack): Correct spelling of "initialized" in comments for clarity

a12e23c

TKanX self-assigned this Apr 4, 2026

TKanX added enhancement ✨ New feature or request performance ⚡ Performance improvements and code optimizations labels Apr 4, 2026

TKanX linked an issue Apr 4, 2026 that may be closed by this pull request

Improve Intra-Structure Parallel Efficiency #13

Closed

6 tasks

Copilot AI review requested due to automatic review settings April 4, 2026 23:37

Copilot started reviewing on behalf of TKanX April 4, 2026 23:38 View session

Copilot AI reviewed Apr 4, 2026

View reviewed changes

Comment thread crates/dreid-pack/src/pack/phase/dee.rs

Comment thread crates/dreid-pack/src/pack/phase/dee.rs

TKanX merged commit cebbf25 into main Apr 5, 2026
8 checks passed

TKanX deleted the feature/13-improve-intra-structure-parallel-efficiency branch April 5, 2026 04:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(pack): Improve Intra-Structure Parallel Efficiency#14

perf(pack): Improve Intra-Structure Parallel Efficiency#14
TKanX merged 9 commits intomainfrom
feature/13-improve-intra-structure-parallel-efficiency

TKanX commented Apr 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

TKanX commented Apr 4, 2026

Summary:

Changes:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants