Skip to content

Validate quantization#315

Merged
yunfeng-scale merged 5 commits intomainfrom
yunfeng-validate-quantization
Oct 12, 2023
Merged

Validate quantization#315
yunfeng-scale merged 5 commits intomainfrom
yunfeng-validate-quantization

Conversation

@yunfeng-scale
Copy link
Copy Markdown
Contributor

@yunfeng-scale yunfeng-scale commented Oct 10, 2023

Validate quantization values when creating endpoints


GIT_TAG: str = os.environ.get("GIT_TAG", "GIT_TAG_NOT_FOUND")
if GIT_TAG == "GIT_TAG_NOT_FOUND":
if GIT_TAG == "GIT_TAG_NOT_FOUND" and "pytest" not in sys.modules:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make pytest work without specifying GIT_TAG

LLMInferenceFramework.DEEPSPEED: [],
LLMInferenceFramework.TEXT_GENERATION_INFERENCE: [Quantization.BITSANDBYTES],
LLMInferenceFramework.VLLM: [Quantization.AWQ],
LLMInferenceFramework.LIGHTLLM: [],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably best for a separate pr, but can you update the docs to specify which models in the model zoo support lightllm as inference framework?

)
if num_shards > gpus:
raise ObjectHasInvalidValueException(
f"Num shard {num_shards} must be less than or equal to the number of GPUs {gpus}."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could mention the inference framework in the error msg

@yunfeng-scale yunfeng-scale enabled auto-merge (squash) October 11, 2023 19:15
@yunfeng-scale yunfeng-scale merged commit 60ac144 into main Oct 12, 2023
@yunfeng-scale yunfeng-scale deleted the yunfeng-validate-quantization branch October 12, 2023 01:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants