Mlc_llm(0.19.0) does not support qwen3 model

Hi ,

I want to run the Qwen3-4B model on my orin Nano Super8G device, I download the mlc-ai/Qwen3-4B-q4f16_1-MLC from the huggingface, but I count the following issue when I run mlc_llm, it seemed that current mlc_llm doesn’t support qwen3

python3 benchmark.py --model /root/.cache/mlc_llm/mlc-ai/Qwen3-4B-q4f16_1-MLC/ --max-num-prompts 4 --prompt ~/.cache/mlc_llm/jetson-containers-master/data/prompts/completion_1024.json --prefill-chunk-size 1024 --save Qwen3-4B-Instruct-MLC.csv
TVM version: 0.19.0
MLC version: 0.19.0

USE_NVTX: OFF
USE_GTEST: OFF
SUMMARIZE: ON
TVM_DEBUG_WITH_ABI_CHANGE: OFF
USE_IOS_RPC: OFF
USE_MSC: OFF
CUDA_VERSION: 12.6
USE_LIBBACKTRACE: OFF
DLPACK_PATH: 3rdparty/dlpack/include
USE_TENSORRT_CODEGEN: OFF
USE_OPENCL_EXTN_QCOM: NOT-FOUND
USE_TARGET_ONNX: OFF
USE_AOT_EXECUTOR: OFF
BUILD_DUMMY_LIBTVM: OFF
USE_CUDNN: ON
USE_TENSORRT_RUNTIME: OFF
USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF
USE_THRUST: ON
USE_CCACHE: ON
USE_ARM_COMPUTE_LIB: OFF
USE_CPP_RTVM:
USE_OPENCL_GTEST: /path/to/opencl/gtest
TVM_LOG_BEFORE_THROW: OFF
USE_MKL: OFF
USE_PT_TVMDSOOP: OFF
MLIR_VERSION: NOT-FOUND
USE_CLML: OFF
USE_STACKVM_RUNTIME: ON
USE_GRAPH_EXECUTOR_CUDA_GRAPH: ON
ROCM_PATH: /opt/rocm
USE_DNNL: OFF
USE_MSCCL: OFF
USE_NNAPI_RUNTIME: OFF
USE_VITIS_AI: OFF
USE_MLIR: OFF
USE_RCCL: OFF
USE_LLVM: /usr/bin/llvm-config --link-static
USE_VERILATOR: OFF
USE_TF_TVMDSOOP: OFF
USE_THREADS: ON
USE_MSVC_MT: OFF
BACKTRACE_ON_SEGFAULT: OFF
USE_GRAPH_EXECUTOR: ON
USE_NCCL: OFF
USE_ROCBLAS: OFF
GIT_COMMIT_HASH: 7ed4584952546fa5d54366b72a6862f919c18daa
USE_VULKAN: OFF
USE_RUST_EXT: OFF
USE_CUTLASS: ON
USE_CPP_RPC: OFF
USE_HEXAGON: OFF
USE_CUSTOM_LOGGING: OFF
USE_UMA: OFF
USE_FALLBACK_STL_MAP: OFF
USE_SORT: ON
USE_RTTI: ON
GIT_COMMIT_TIME: 2024-12-15 09:56:40 -0500
USE_HIPBLAS: OFF
USE_HEXAGON_SDK: /path/to/sdk
USE_BLAS: none
USE_LIBTORCH: OFF
USE_RANDOM: ON
USE_CUDA: ON
USE_COREML: OFF
USE_AMX: OFF
BUILD_STATIC_RUNTIME: OFF
USE_KHRONOS_SPIRV: OFF
USE_CLML_GRAPH_EXECUTOR: OFF
USE_TFLITE: OFF
USE_HEXAGON_GTEST: /path/to/hexagon/gtest
PICOJSON_PATH: 3rdparty/picojson
USE_OPENCL_ENABLE_HOST_PTR: OFF
INSTALL_DEV: OFF
USE_PROFILER: OFF
USE_NNPACK: OFF
LLVM_VERSION: 17.0.6
USE_MRVL: OFF
USE_OPENCL: OFF
COMPILER_RT_PATH: 3rdparty/compiler-rt
USE_NNAPI_CODEGEN: OFF
RANG_PATH: 3rdparty/rang/include
USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF
USE_OPENMP: OFF
USE_BNNS: OFF
USE_FLASHINFER:
USE_CUBLAS: ON
USE_METAL: OFF
USE_HEXAGON_EXTERNAL_LIBS: OFF
USE_ALTERNATIVE_LINKER: AUTO
USE_BYODT_POSIT: OFF
USE_NVSHMEM: OFF
USE_HEXAGON_RPC: OFF
DMLC_PATH: 3rdparty/dmlc-core/include
INDEX_DEFAULT_I64: ON
USE_RELAY_DEBUG: OFF
USE_RPC: OFF
USE_TENSORFLOW_PATH: none
TVM_CLML_VERSION:
USE_MIOPEN: OFF
USE_ROCM: OFF
USE_PAPI: OFF
USE_CURAND: ON
TVM_CXX_COMPILER_PATH: /usr/bin/c++
HIDE_PRIVATE_SYMBOLS: ON
Namespace(model=‘/root/.cache/mlc_llm/mlc-ai/Qwen3-4B-q4f16_1-MLC/’, model_lib_path=None, prompt=[‘/root/.cache/mlc_llm/jetson-containers-master/data/prompts/completion_1024.json’], chat=False, streaming=False, max_new_tokens=128, max_num_prompts=4, max_context_len=None, prefill_chunk_size=1024, save=‘Qwen3-4B-Instruct-MLC.csv’)
– loading /root/.cache/mlc_llm/mlc-ai/Qwen3-4B-q4f16_1-MLC/
Traceback (most recent call last):
File “/root/.cache/mlc_llm/jetson-containers-master/packages/llm/mlc/benchmark.py”, line 145, in
model = MLCEngine(args.model, model_lib=args.model_lib_path, mode=‘interactive’, engine_config=cfg)
File “/usr/local/lib/python3.10/dist-packages/mlc_llm/serve/engine.py”, line 1466, in init
super().init(
File “/usr/local/lib/python3.10/dist-packages/mlc_llm/serve/engine_base.py”, line 590, in init
) = _process_model_args(models, device, engine_config)
File “/usr/local/lib/python3.10/dist-packages/mlc_llm/serve/engine_base.py”, line 171, in _process_model_args
model_args: List[Tuple[str, str]] = [_convert_model_info(model) for model in models]
File “/usr/local/lib/python3.10/dist-packages/mlc_llm/serve/engine_base.py”, line 171, in
model_args: List[Tuple[str, str]] = [_convert_model_info(model) for model in models]
File “/usr/local/lib/python3.10/dist-packages/mlc_llm/serve/engine_base.py”, line 164, in _convert_model_info
model_lib = jit.jit(
File “/usr/local/lib/python3.10/dist-packages/mlc_llm/interface/jit.py”, line 129, in jit
“model_config”: _get_model_config(),
File “/usr/local/lib/python3.10/dist-packages/mlc_llm/interface/jit.py”, line 96, in _get_model_config
return MODELS[model_type].config.from_dict(model_config).asdict()
KeyError: ‘qwen3’

I try to update the mlc_llm but I found that mlc-ai-nightly doesn’t support cuda12.6 version. The following web page show it: mlc.ai/wheels
Install MLC LLM Python Package — mlc-llm 0.1.0 documentation

Could you please give me some advise to run the Qwen3-4B model?

Thank you very much for your assistance!

Best regards,
richer.chan

Hi,

We need to check this with our internal team and provide more info to you.
Thanks.

1 Like

Hi @gang.cheng , please check the jetson-containers/packages/llm/mlc at dev · dusty-nv/jetson-containers · GitHub where we have compiled mlc (among many others) with cuda12.8 on ubuntu 24.04 and… mlc 0.20.0 with qwen3 support.

1 Like

Thanks @narandill @gang.cheng, here is the updated container image for MLC that supports Qwen3: dustynv/mlc:0.20.0-r36.4.0

We also posted updated images for vLLM, SGLang, ollama, and llama.cpp.

2 Likes

Hi @narandill @dusty_nv , Thank you for your prompt response. I will provide feedback on the results after trying. Thank you again.

Hi @dusty_nv @narandill @AastaLLL , Qwen3 model worked in container of dustynv/mlc:0.20.0-r36.4.0, thanks a lot.

PROMPT: The weather forecast today is for a high of 100 degrees. I’m not sure if that’s a high for the day or a high for the week, but it’s hot.
I’m not sure if I’m going to be able to get a lot of work done today. I’m going to try to get some work done, but I’m not sure if I’m going to be able to get a lot of work done today.
I’m not sure if I’m going to be able to get a lot of work done today. I’m going to try to get some work done, but I’m not sure if I’m going to be able to get a lot of work done today.
I’m not sure if I’m going to be able to get a lot of work done today. I’m going to try to get some work done, but I’m not sure if I’m going to be able to get a lot of work done today. I’m not sure if I’m going to be able to get a lot of work done today. I’m going to try to get some work done, but I’m not sure if I’m going to be able to get a lot of work done today.
I’m not sure if I’m going to be able to get a lot of work done today. I’m going to try to get some work done, but I’m not sure if I’m going to be able to get a lot of work done today. I’m not sure if I’m going to be able to get a lot of work done today. I’m going to try to get some work done, but I’m not sure if I’m going to be able to get a lot of work done today. I’m not sure if I’m going to be able to get a lot of work done today. I’m going to try to get some work done, but I’m not sure if I’m going to be able to get a lot of work done today. I’m not sure if I’m going to be able to get a lot of work done today. I’m going to try to get some work done, but I’m not sure if I’m going to be able to get a lot of work done today. I’m not sure if I’m going to be able to get a lot of work done today. I’m going to try to get some work done, but I’m not sure if I’m going to be able to get a lot of work done today. I’m not sure if I’m going to be able to get a lot of work done today. I’m going to try to get some work done, but I’m not sure if I’m going to be able to get a lot of work done today. I’m not sure if I’m going to be able to get a lot of work done today. I’m going to try to get some work done, but I’m not sure if I’m going to be able to get a lot of work done today. I’m not sure if I’m going to be able to get a lot of work done today. I’m going to try to get some work done, but I’m not sure if I’m going to be able to get a lot of work done today. I’m not sure if I’m going to be able to get a lot of work done today. I’m going to try to get some work done, but I’m not sure if I’m going to be able to get a lot of work done today. I’m not sure if I’m going to be able to get a lot of work done today. I’m going to try to get some work done, but I’m not sure if I’m going to be able to get a lot of work done today. I’m not sure if I’m going to be able to get a lot of work done today. I’m going to try to get some work done, but I’m not sure if I’m going to be able to get a lot of work done today. I’m not sure if I’m going to be able to get a lot of work done today. I’m going to try to get some work done, but I’m not sure if I’m going to be able to get a lot of work done today. I’m not sure if I’m going to be able to get a lot of work done today. I’m going to try to get some work done, but I’m not sure if I’m going to be able to get a lot of work done today. I’m not sure if I

Okay, the user is talking about the weather being 100 degrees and how that affects their ability to work. They mention being unsure if they can get a lot of work done today, repeating the same sentence multiple times. It seems like they're feeling uncertain or maybe overwhelmed by the heat. I should acknowledge their concern about the weather affecting their productivity. Maybe offer some advice on how to cope with the heat, like staying hydrated, taking breaks, or working in a cooler environment. Also, reassure them that it's okay to not be able to do a lot of work if the weather is bad. Maybe suggest ways to /root/.cache/mlc_llm/mlc-ai/Qwen3-4B-q4f16_1-MLC/: input=934 output=128 prefill_time 1.396 sec, prefill_rate 668.9 tokens/sec, decode_time 5.104 sec, decode_rate 25.1 tokens/sec

AVERAGE OVER 3 RUNS (input_tokens=959, output_tokens=128)
/root/.cache/mlc_llm/mlc-ai/Qwen3-4B-q4f16_1-MLC/: prefill_time 1.466 sec, prefill_rate 655.9 tokens/sec, decode_time 5.114 sec, decode_rate 25.0 tokens/sec

Peak memory usage: 1728.58 MB
Saved results to: Qwen3-4B-Instruct-q4f16_1-MLC.csv

Best Regards,
richer.chan

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.