Comparing changes

Consolidate latest news entries into a single update.

…ts (#2161) * run forward pass even for empty subset to produce correct layer outputs * add todo comment

* reduce awq memory usage * fix samples for ci test * preallocate workspaces * update * more testing * comments * fix regression. need to snapshot state.modules * do not move activations to cpu * clone og weights to cpu * fix local memory waste caused by holding on to memory longer than we need them * inplace ops and benchmark * log * log

* don't emit debug logs * log * update comments * update scores, add single gpu ci test option * odd version for dev

…er (reapply) (#2169) * reapply try partial.to * refactor

* ad awq moe test * udpate scores * fix quant speed regression * comments

* add :? capture only syntax * add enable_activation_capture control to loop processor

* Fixed an issue in AWQ quantization that used the wrong input_feature["mlp"] tensor Signed-off-by: ZX-ModelCloud <[email protected]> * process moe block Signed-off-by: ZX-ModelCloud <[email protected]> * fix merge error Signed-off-by: ZX-ModelCloud <[email protected]> * Obtain the CAPTURE_ONLY_FLAG Module Signed-off-by: ZX-ModelCloud <[email protected]> * Add "mlp" to the subset ["mlp.experts.#.gate_proj", "mlp.experts.#.up_proj"] Signed-off-by: ZX-ModelCloud <[email protected]> * NamedModule override register_forward_hook() Signed-off-by: ZX-ModelCloud <[email protected]> * cleanup Signed-off-by: ZX-ModelCloud <[email protected]> * cleanup Signed-off-by: ZX-ModelCloud <[email protected]> * remove custom override * new parent node subset merging * do not quantize gate * cleanup * ruff * fix build_moe_modules_if_need() Signed-off-by: ZX-ModelCloud <[email protected]> * fix wrong inp tensor Signed-off-by: ZX-ModelCloud <[email protected]> * fix expert `mlp.experts.0.down_proj` due to missing prev_op Signed-off-by: ZX-ModelCloud <[email protected]> --------- Signed-off-by: ZX-ModelCloud <[email protected]> Co-authored-by: Qubitium <[email protected]>

* adjust retry partial.to * adjust delay

* cleanup Signed-off-by: ZX-ModelCloud <[email protected]> * cleanup last_module Signed-off-by: ZX-ModelCloud <[email protected]> --------- Signed-off-by: ZX-ModelCloud <[email protected]>

* cleanup is_moe_down_block / is_moe_gate_up_block Signed-off-by: ZX-ModelCloud <[email protected]> * cleanup last_up_proj_index Signed-off-by: ZX-ModelCloud <[email protected]> * Filtering MLP modules like Qwen3MoeSparseMoeBlock Signed-off-by: ZX-ModelCloud <[email protected]> --------- Signed-off-by: ZX-ModelCloud <[email protected]>

* add module converter * cleanup

* add compile Signed-off-by: jiqing-feng <[email protected]> * rm inside compile Signed-off-by: jiqing-feng <[email protected]> --------- Signed-off-by: jiqing-feng <[email protected]>

* kimi k2 test * ruff

* Fixed version("triton") crash on torch+xpu Signed-off-by: ZX-ModelCloud <[email protected]> * format Signed-off-by: ZX-ModelCloud <[email protected]> --------- Signed-off-by: ZX-ModelCloud <[email protected]>

* use torch.ops.aten fused ops for awq * cleanup * cleanup2 * float16 only * log * fused path * add gptq torch fused doc on layout * fix awq transformation * log rtol/atol * cleanup * cleanup2 * cleanup 3 * remove unused * inline methods * remove debug logs * avoid clone * merge code with gptq torch fused * cleanup, add XPU todo * make sure to test both xpu and cpu * xpu tests * fix xpu transform * cleanup * cleanup2 * cleanup 3 plus test logs * tabulate logs * prepare for v5.4.0 release

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comparing changes

Open a pull request

Commits on Nov 2, 2025

Commits on Nov 3, 2025

Commits on Nov 4, 2025

Commits on Nov 5, 2025

Commits on Nov 6, 2025

Commits on Nov 7, 2025

Commits on Nov 9, 2025

This comparison is taking too long to generate.

Uh oh!