Skip to content

Conversation

@Qubitium
Copy link
Collaborator

@Qubitium Qubitium commented Sep 17, 2025

Ugly ugly hacks. Do not read this PR for your own sanity. It will be cleaned up later. Lots of debug, random var names, etc. Seriously, don't read the commits.

Current progress:

  • 23.9% cpu memory saving for the entire end-to-end quantization + packing process.
  • 27% cpu memory saving if we only count the quantization stage without packing. Packing introduce more cpu ram usage which can be fixed in future commits.

Target: > 75%

Signed-off-by: Qubitium <[email protected]>
Signed-off-by: Qubitium <[email protected]>
Signed-off-by: Qubitium <[email protected]>
@Qubitium Qubitium marked this pull request as draft September 17, 2025 13:46
Signed-off-by: Qubitium <[email protected]>
@Qubitium
Copy link
Collaborator Author

6.4 GB to 4.1 GB quant ( excluding packing ) => 35.9% saving...

Pizza time!

Signed-off-by: Qubitium <[email protected]>
@Qubitium
Copy link
Collaborator Author

6.4 GB to 2.4 GB ( excluding packing ) => 62.5% saving...

Ice cream?

Signed-off-by: Qubitium <[email protected]>
Signed-off-by: Qubitium <[email protected]>
@Qubitium Qubitium marked this pull request as ready for review September 18, 2025 08:00
@Qubitium Qubitium marked this pull request as draft September 18, 2025 08:20
@Qubitium Qubitium marked this pull request as ready for review September 19, 2025 00:21
Signed-off-by: Qubitium <[email protected]>
@Qubitium
Copy link
Collaborator Author

6.4 GB to 1.7 GB ( excluding packing ) => 73.5% saving...

@Qubitium Qubitium merged commit 785ea82 into main Sep 19, 2025
5 checks passed
@CSY-ModelCloud CSY-ModelCloud deleted the turtle-in-a-half-shell branch December 5, 2025 02:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants