Comparing changes

* unit test use tinyllama-15M model * load dataset with range 2048 * change load_dataset, download model to local and modify config.json * modify model path * Update test_perplexity.py * Update test_perplexity.py --------- Co-authored-by: Qubitium-ModelCloud <[email protected]>

* add transformers integration * use gptqmodel * Update hf_quantizer_gptq.py * add monkey_patch_gptq_transformers() * cleanup * Fix issue: incorrect qlinear in transformers inegration quantization model * add unit tests of Transformers integration * monkey patch model.save_pretrained() * cleanup * cleanup * rename monkey_patch_gptq_transformers() to monkey_patch_gptqmodel_into_transformers() * select_quant_linear() remove "disable_exllama" param --------- Co-authored-by: Qubitium-ModelCloud <[email protected]>

Co-authored-by: LRL-ModelCloud <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comparing changes

Open a pull request

Commits on Jul 3, 2024

Commits on Jul 4, 2024

This comparison is taking too long to generate.

Uh oh!