-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Closed
Labels
ep:WebGPUort-web webgpu providerort-web webgpu providermodel:transformerissues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc.issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc.platform:webissues related to ONNX Runtime web; typically submitted using templateissues related to ONNX Runtime web; typically submitted using template
Description
Describe the issue
The following error occurs when trying to run https://huggingface.co/HuggingFaceTB/SmolVLM-Instruct on WebGPU.
Note that the CPU implementation operates correctly, so this is indeed a bug with the WebGPU EP. Moreover, the zero-dimension tensor is by design, and is used for the first generation step.
To reproduce
- Install and build Transformers.js from source (https://github.com/huggingface/transformers.js)
- Run the following code in-browser:
import {
AutoProcessor,
AutoModelForVision2Seq,
load_image,
} from "@huggingface/transformers";
// Initialize processor and model
const model_id = "HuggingFaceTB/SmolVLM-Instruct";
const processor = await AutoProcessor.from_pretrained(model_id);
const model = await AutoModelForVision2Seq.from_pretrained(model_id, {
dtype: {
embed_tokens: "fp16", // "fp32", "fp16", "q8"
vision_encoder: "q4", // "fp32", "fp16", "q8", "q4", "q4f16"
decoder_model_merged: "q4", // "q8", "q4", "q4f16"
},
device: 'webgpu',
});
// Load images
const image1 = await load_image("https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg");
const image2 = await load_image("https://huggingface.co/spaces/merve/chameleon-7b/resolve/main/bee.jpg");
// Create input messages
const messages = [
{
role: "user",
content: [
{ type: "image" },
{ type: "image" },
{ type: "text", text: "Can you describe the two images?" },
],
},
];
// Prepare inputs
const text = processor.apply_chat_template(messages, { add_generation_prompt: true });
const inputs = await processor(text, [image1, image2], {
// Set `do_image_splitting: true` to split images into multiple patches.
// NOTE: This uses more memory, but can provide more accurate results.
do_image_splitting: false,
});
// Generate outputs
const generated_ids = await model.generate({
...inputs,
max_new_tokens: 500,
});
const generated_texts = processor.batch_decode(
generated_ids.slice(null, [inputs.input_ids.dims.at(-1), null]),
{ skip_special_tokens: true },
);
console.log(generated_texts[0]);
// ' In the first image, there is a green statue of liberty on a pedestal in the middle of the water. The water is surrounded by trees and buildings in the background. In the second image, there are pink and red flowers with a bee on the pink flower.'Urgency
This blocks SmolVLM usage in Transformers.js.
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.20.1
Execution Provider
'webgpu' (WebGPU)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
ep:WebGPUort-web webgpu providerort-web webgpu providermodel:transformerissues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc.issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc.platform:webissues related to ONNX Runtime web; typically submitted using templateissues related to ONNX Runtime web; typically submitted using template
