Skip to content

Conversation

@Qubitium
Copy link
Collaborator

@Qubitium Qubitium commented Apr 2, 2025

At block_size = tensor.columns // 2 == 1024, torch.float32, 2048x2048

Performance Summary (50 runs on 2048x2048 matrix):
Block Version:
  Average Time: 0.0011 ± 0.0005 s
  Peak Memory: 72.12 ± 0.00 MB

Standard Version:
  Average Time: 0.0026 ± 0.0007 s
  Peak Memory: 88.13 ± 0.00 MB

Numerical Error Analysis (max absolute difference):
Mean error: 3.3730e-02
Max error: 3.8948e-02
Min error: 3.1047e-02

At block_size = tensor.columns // 1 == 2048, torch.float32, 2048x2048

Performance Summary (50 runs on 2048x2048 matrix):
Block Version:
  Average Time: 0.0017 ± 0.0004 s
  Peak Memory: 88.12 ± 0.00 MB

Standard Version:
  Average Time: 0.0025 ± 0.0009 s
  Peak Memory: 88.13 ± 0.00 MB

Numerical Error Analysis (max absolute difference):
Mean error: 1.9632e-08
Max error: 2.6077e-08
Min error: 1.6764e-08

@Qubitium Qubitium merged commit b9e61ce into main Apr 2, 2025
2 of 4 checks passed
@Qubitium Qubitium deleted the low-memory-choleksy-inverse branch April 2, 2025 02:51
@Qubitium Qubitium changed the title >2x faster cholesky inverse and avoid oom when possible faster cholesky inverse and avoid oom when possible Apr 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants