Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: ModelCloud/GPTQModel
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v5.2.0
Choose a base ref
...
head repository: ModelCloud/GPTQModel
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: v5.4.0
Choose a head ref
  • 18 commits
  • 39 files changed
  • 5 contributors

Commits on Nov 2, 2025

  1. Update latest news section in README.md (#2166)

    Consolidate latest news entries into a single update.
    Qubitium authored Nov 2, 2025
    Configuration menu
    Copy the full SHA
    ab20a22 View commit details
    Browse the repository at this point in the history
  2. run forward pass even for empty subset to produce correct layer outpu…

    …ts (#2161)
    
    * run forward pass even for empty subset to produce correct layer outputs
    
    * add todo comment
    avtc authored Nov 2, 2025
    Configuration menu
    Copy the full SHA
    1618322 View commit details
    Browse the repository at this point in the history

Commits on Nov 3, 2025

  1. Reduce AWQ memory usage (#2167)

    * reduce awq memory usage
    
    * fix samples for ci test
    
    * preallocate workspaces
    
    * update
    
    * more testing
    
    * comments
    
    * fix regression. need to snapshot state.modules
    
    * do not move activations to cpu
    
    * clone og weights to cpu
    
    * fix local memory waste caused by holding on to memory longer than we need them
    
    * inplace ops and benchmark
    
    * log
    
    * log
    Qubitium authored Nov 3, 2025
    Configuration menu
    Copy the full SHA
    a0e065a View commit details
    Browse the repository at this point in the history
  2. Awq update (#2168)

    * don't emit debug logs
    
    * log
    
    * update comments
    
    * update scores, add single gpu ci test option
    
    * odd version for dev
    Qubitium authored Nov 3, 2025
    Configuration menu
    Copy the full SHA
    7597ec4 View commit details
    Browse the repository at this point in the history

Commits on Nov 4, 2025

  1. Retry partial to to fix accelerate invalid argument for first moe lay…

    …er (reapply) (#2169)
    
    * reapply try partial.to
    
    * refactor
    avtc authored Nov 4, 2025
    Configuration menu
    Copy the full SHA
    1a1c5a5 View commit details
    Browse the repository at this point in the history
  2. Awq update (#2172)

    * ad awq moe test
    
    * udpate scores
    
    * fix quant speed regression
    
    * comments
    Qubitium authored Nov 4, 2025
    Configuration menu
    Copy the full SHA
    ee9a956 View commit details
    Browse the repository at this point in the history
  3. add :? capture only syntax (#2173)

    * add :? capture only syntax
    
    * add enable_activation_capture control to loop processor
    Qubitium authored Nov 4, 2025
    Configuration menu
    Copy the full SHA
    3dc11fa View commit details
    Browse the repository at this point in the history

Commits on Nov 5, 2025

  1. [FIX] AWQ MoE (#2171)

    * Fixed an issue in AWQ quantization that used the wrong input_feature["mlp"] tensor
    
    Signed-off-by: ZX-ModelCloud <[email protected]>
    
    * process moe block
    
    Signed-off-by: ZX-ModelCloud <[email protected]>
    
    * fix merge error
    
    Signed-off-by: ZX-ModelCloud <[email protected]>
    
    * Obtain the CAPTURE_ONLY_FLAG Module
    
    Signed-off-by: ZX-ModelCloud <[email protected]>
    
    * Add "mlp" to the subset ["mlp.experts.#.gate_proj", "mlp.experts.#.up_proj"]
    
    Signed-off-by: ZX-ModelCloud <[email protected]>
    
    * NamedModule override register_forward_hook()
    
    Signed-off-by: ZX-ModelCloud <[email protected]>
    
    * cleanup
    
    Signed-off-by: ZX-ModelCloud <[email protected]>
    
    * cleanup
    
    Signed-off-by: ZX-ModelCloud <[email protected]>
    
    * remove custom override
    
    * new parent node subset merging
    
    * do not quantize gate
    
    * cleanup
    
    * ruff
    
    * fix build_moe_modules_if_need()
    
    Signed-off-by: ZX-ModelCloud <[email protected]>
    
    * fix wrong inp tensor
    
    Signed-off-by: ZX-ModelCloud <[email protected]>
    
    * fix expert `mlp.experts.0.down_proj` due to missing prev_op
    
    Signed-off-by: ZX-ModelCloud <[email protected]>
    
    ---------
    
    Signed-off-by: ZX-ModelCloud <[email protected]>
    Co-authored-by: Qubitium <[email protected]>
    ZX-ModelCloud and Qubitium authored Nov 5, 2025
    Configuration menu
    Copy the full SHA
    f956309 View commit details
    Browse the repository at this point in the history

Commits on Nov 6, 2025

  1. adjust retry partial.to (#2175)

    * adjust retry partial.to
    
    * adjust delay
    avtc authored Nov 6, 2025
    Configuration menu
    Copy the full SHA
    f4984c8 View commit details
    Browse the repository at this point in the history
  2. cleanup awq_get_modules_for_scaling() (#2179)

    * cleanup
    
    Signed-off-by: ZX-ModelCloud <[email protected]>
    
    * cleanup last_module
    
    Signed-off-by: ZX-ModelCloud <[email protected]>
    
    ---------
    
    Signed-off-by: ZX-ModelCloud <[email protected]>
    ZX-ModelCloud authored Nov 6, 2025
    Configuration menu
    Copy the full SHA
    d5162cc View commit details
    Browse the repository at this point in the history
  3. [FIX] qwen3 moe sparse moe block (#2184)

    * cleanup is_moe_down_block / is_moe_gate_up_block
    
    Signed-off-by: ZX-ModelCloud <[email protected]>
    
    * cleanup last_up_proj_index
    
    Signed-off-by: ZX-ModelCloud <[email protected]>
    
    * Filtering MLP modules like Qwen3MoeSparseMoeBlock
    
    Signed-off-by: ZX-ModelCloud <[email protected]>
    
    ---------
    
    Signed-off-by: ZX-ModelCloud <[email protected]>
    ZX-ModelCloud authored Nov 6, 2025
    Configuration menu
    Copy the full SHA
    e0d4d70 View commit details
    Browse the repository at this point in the history
  4. Add module convert (#2183)

    * add module converter
    
    * cleanup
    LRL2-ModelCloud authored Nov 6, 2025
    Configuration menu
    Copy the full SHA
    21106e8 View commit details
    Browse the repository at this point in the history
  5. Make torch fused op compilable (#2182)

    * add compile
    
    Signed-off-by: jiqing-feng <[email protected]>
    
    * rm inside compile
    
    Signed-off-by: jiqing-feng <[email protected]>
    
    ---------
    
    Signed-off-by: jiqing-feng <[email protected]>
    jiqing-feng authored Nov 6, 2025
    Configuration menu
    Copy the full SHA
    9d524ef View commit details
    Browse the repository at this point in the history

Commits on Nov 7, 2025

  1. Cleanup (#2185)

    * kimi k2 test
    
    * ruff
    Qubitium authored Nov 7, 2025
    Configuration menu
    Copy the full SHA
    936023f View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    3ad365c View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    084f2ad View commit details
    Browse the repository at this point in the history
  4. [FIX] version("triton") crash on torch+xpu (#2188)

    * Fixed version("triton") crash on torch+xpu
    
    Signed-off-by: ZX-ModelCloud <[email protected]>
    
    * format
    
    Signed-off-by: ZX-ModelCloud <[email protected]>
    
    ---------
    
    Signed-off-by: ZX-ModelCloud <[email protected]>
    ZX-ModelCloud authored Nov 7, 2025
    Configuration menu
    Copy the full SHA
    be7a43b View commit details
    Browse the repository at this point in the history

Commits on Nov 9, 2025

  1. AWQ Torch Fused Kernel (#2190)

    * use torch.ops.aten fused ops for awq
    
    * cleanup
    
    * cleanup2
    
    * float16 only
    
    * log
    
    * fused path
    
    * add gptq torch fused doc on layout
    
    * fix awq transformation
    
    * log rtol/atol
    
    * cleanup
    
    * cleanup2
    
    * cleanup 3
    
    * remove unused
    
    * inline methods
    
    * remove debug logs
    
    * avoid clone
    
    * merge code with gptq torch fused
    
    * cleanup, add XPU todo
    
    * make sure to test both xpu and cpu
    
    * xpu tests
    
    * fix xpu transform
    
    * cleanup
    
    * cleanup2
    
    * cleanup 3 plus test logs
    
    * tabulate logs
    
    * prepare for v5.4.0 release
    Qubitium authored Nov 9, 2025
    Configuration menu
    Copy the full SHA
    e0da12a View commit details
    Browse the repository at this point in the history
Loading