Mlc_llm(0.19.0) does not support qwen3 model

gang.cheng · May 13, 2025, 5:59am

Hi ,

I want to run the Qwen3-4B model on my orin Nano Super8G device, I download the mlc-ai/Qwen3-4B-q4f16_1-MLC from the huggingface, but I count the following issue when I run mlc_llm, it seemed that current mlc_llm doesn’t support qwen3

python3 benchmark.py --model /root/.cache/mlc_llm/mlc-ai/Qwen3-4B-q4f16_1-MLC/ --max-num-prompts 4 --prompt ~/.cache/mlc_llm/jetson-containers-master/data/prompts/completion_1024.json --prefill-chunk-size 1024 --save Qwen3-4B-Instruct-MLC.csv
TVM version: 0.19.0
MLC version: 0.19.0

USE_NVTX: OFF
USE_GTEST: OFF
SUMMARIZE: ON
TVM_DEBUG_WITH_ABI_CHANGE: OFF
USE_IOS_RPC: OFF
USE_MSC: OFF
CUDA_VERSION: 12.6
USE_LIBBACKTRACE: OFF
DLPACK_PATH: 3rdparty/dlpack/include
USE_TENSORRT_CODEGEN: OFF
USE_OPENCL_EXTN_QCOM: NOT-FOUND
USE_TARGET_ONNX: OFF
USE_AOT_EXECUTOR: OFF
BUILD_DUMMY_LIBTVM: OFF
USE_CUDNN: ON
USE_TENSORRT_RUNTIME: OFF
USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF
USE_THRUST: ON
USE_CCACHE: ON
USE_ARM_COMPUTE_LIB: OFF
USE_CPP_RTVM:
USE_OPENCL_GTEST: /path/to/opencl/gtest
TVM_LOG_BEFORE_THROW: OFF
USE_MKL: OFF
USE_PT_TVMDSOOP: OFF
MLIR_VERSION: NOT-FOUND
USE_CLML: OFF
USE_STACKVM_RUNTIME: ON
USE_GRAPH_EXECUTOR_CUDA_GRAPH: ON
ROCM_PATH: /opt/rocm
USE_DNNL: OFF
USE_MSCCL: OFF
USE_NNAPI_RUNTIME: OFF
USE_VITIS_AI: OFF
USE_MLIR: OFF
USE_RCCL: OFF
USE_LLVM: /usr/bin/llvm-config --link-static
USE_VERILATOR: OFF
USE_TF_TVMDSOOP: OFF
USE_THREADS: ON
USE_MSVC_MT: OFF
BACKTRACE_ON_SEGFAULT: OFF
USE_GRAPH_EXECUTOR: ON
USE_NCCL: OFF
USE_ROCBLAS: OFF
GIT_COMMIT_HASH: 7ed4584952546fa5d54366b72a6862f919c18daa
USE_VULKAN: OFF
USE_RUST_EXT: OFF
USE_CUTLASS: ON
USE_CPP_RPC: OFF
USE_HEXAGON: OFF
USE_CUSTOM_LOGGING: OFF
USE_UMA: OFF
USE_FALLBACK_STL_MAP: OFF
USE_SORT: ON
USE_RTTI: ON
GIT_COMMIT_TIME: 2024-12-15 09:56:40 -0500
USE_HIPBLAS: OFF
USE_HEXAGON_SDK: /path/to/sdk
USE_BLAS: none
USE_LIBTORCH: OFF
USE_RANDOM: ON
USE_CUDA: ON
USE_COREML: OFF
USE_AMX: OFF
BUILD_STATIC_RUNTIME: OFF
USE_KHRONOS_SPIRV: OFF
USE_CLML_GRAPH_EXECUTOR: OFF
USE_TFLITE: OFF
USE_HEXAGON_GTEST: /path/to/hexagon/gtest
PICOJSON_PATH: 3rdparty/picojson
USE_OPENCL_ENABLE_HOST_PTR: OFF
INSTALL_DEV: OFF
USE_PROFILER: OFF
USE_NNPACK: OFF
LLVM_VERSION: 17.0.6
USE_MRVL: OFF
USE_OPENCL: OFF
COMPILER_RT_PATH: 3rdparty/compiler-rt
USE_NNAPI_CODEGEN: OFF
RANG_PATH: 3rdparty/rang/include
USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF
USE_OPENMP: OFF
USE_BNNS: OFF
USE_FLASHINFER:
USE_CUBLAS: ON
USE_METAL: OFF
USE_HEXAGON_EXTERNAL_LIBS: OFF
USE_ALTERNATIVE_LINKER: AUTO
USE_BYODT_POSIT: OFF
USE_NVSHMEM: OFF
USE_HEXAGON_RPC: OFF
DMLC_PATH: 3rdparty/dmlc-core/include
INDEX_DEFAULT_I64: ON
USE_RELAY_DEBUG: OFF
USE_RPC: OFF
USE_TENSORFLOW_PATH: none
TVM_CLML_VERSION:
USE_MIOPEN: OFF
USE_ROCM: OFF
USE_PAPI: OFF
USE_CURAND: ON
TVM_CXX_COMPILER_PATH: /usr/bin/c++
HIDE_PRIVATE_SYMBOLS: ON
Namespace(model=‘/root/.cache/mlc_llm/mlc-ai/Qwen3-4B-q4f16_1-MLC/’, model_lib_path=None, prompt=[‘/root/.cache/mlc_llm/jetson-containers-master/data/prompts/completion_1024.json’], chat=False, streaming=False, max_new_tokens=128, max_num_prompts=4, max_context_len=None, prefill_chunk_size=1024, save=‘Qwen3-4B-Instruct-MLC.csv’)
– loading /root/.cache/mlc_llm/mlc-ai/Qwen3-4B-q4f16_1-MLC/
Traceback (most recent call last):
File “/root/.cache/mlc_llm/jetson-containers-master/packages/llm/mlc/benchmark.py”, line 145, in
model = MLCEngine(args.model, model_lib=args.model_lib_path, mode=‘interactive’, engine_config=cfg)
File “/usr/local/lib/python3.10/dist-packages/mlc_llm/serve/engine.py”, line 1466, in init
super().init(
File “/usr/local/lib/python3.10/dist-packages/mlc_llm/serve/engine_base.py”, line 590, in init
) = _process_model_args(models, device, engine_config)
File “/usr/local/lib/python3.10/dist-packages/mlc_llm/serve/engine_base.py”, line 171, in _process_model_args
model_args: List[Tuple[str, str]] = [_convert_model_info(model) for model in models]
File “/usr/local/lib/python3.10/dist-packages/mlc_llm/serve/engine_base.py”, line 171, in
model_args: List[Tuple[str, str]] = [_convert_model_info(model) for model in models]
File “/usr/local/lib/python3.10/dist-packages/mlc_llm/serve/engine_base.py”, line 164, in _convert_model_info
model_lib = jit.jit(
File “/usr/local/lib/python3.10/dist-packages/mlc_llm/interface/jit.py”, line 129, in jit
“model_config”: _get_model_config(),
File “/usr/local/lib/python3.10/dist-packages/mlc_llm/interface/jit.py”, line 96, in _get_model_config
return MODELS[model_type].config.from_dict(model_config).asdict()
KeyError: ‘qwen3’

I try to update the mlc_llm but I found that mlc-ai-nightly doesn’t support cuda12.6 version. The following web page show it: mlc.ai/wheels
Install MLC LLM Python Package — mlc-llm 0.1.0 documentation

Could you please give me some advise to run the Qwen3-4B model?

Thank you very much for your assistance!

Best regards,
richer.chan

AastaLLL · May 13, 2025, 8:09am

Hi,

We need to check this with our internal team and provide more info to you.
Thanks.

narandill · May 13, 2025, 8:17am

Hi @gang.cheng , please check the jetson-containers/packages/llm/mlc at dev · dusty-nv/jetson-containers · GitHub where we have compiled mlc (among many others) with cuda12.8 on ubuntu 24.04 and… mlc 0.20.0 with qwen3 support.

dusty_nv · May 13, 2025, 5:34pm

Thanks @narandill @gang.cheng, here is the updated container image for MLC that supports Qwen3: dustynv/mlc:0.20.0-r36.4.0

We also posted updated images for vLLM, SGLang, ollama, and llama.cpp.

gang.cheng · May 14, 2025, 2:01am

Hi @narandill @dusty_nv , Thank you for your prompt response. I will provide feedback on the results after trying. Thank you again.

gang.cheng · May 14, 2025, 3:18am

Hi @dusty_nv @narandill @AastaLLL , Qwen3 model worked in container of dustynv/mlc:0.20.0-r36.4.0, thanks a lot.

PROMPT: The weather forecast today is for a high of 100 degrees. I’m not sure if that’s a high for the day or a high for the week, but it’s hot.
I’m not sure if I’m going to be able to get a lot of work done today. I’m going to try to get some work done, but I’m not sure if I’m going to be able to get a lot of work done today.
I’m not sure if I’m going to be able to get a lot of work done today. I’m going to try to get some work done, but I’m not sure if I’m going to be able to get a lot of work done today.
I’m not sure if I’m going to be able to get a lot of work done today. I’m going to try to get some work done, but I’m not sure if I’m going to be able to get a lot of work done today. I’m not sure if I’m going to be able to get a lot of work done today. I’m going to try to get some work done, but I’m not sure if I’m going to be able to get a lot of work done today.
I’m not sure if I’m going to be able to get a lot of work done today. I’m going to try to get some work done, but I’m not sure if I’m going to be able to get a lot of work done today. I’m not sure if I’m going to be able to get a lot of work done today. I’m going to try to get some work done, but I’m not sure if I’m going to be able to get a lot of work done today. I’m not sure if I’m going to be able to get a lot of work done today. I’m going to try to get some work done, but I’m not sure if I’m going to be able to get a lot of work done today. I’m not sure if I’m going to be able to get a lot of work done today. I’m going to try to get some work done, but I’m not sure if I’m going to be able to get a lot of work done today. I’m not sure if I’m going to be able to get a lot of work done today. I’m going to try to get some work done, but I’m not sure if I’m going to be able to get a lot of work done today. I’m not sure if I’m going to be able to get a lot of work done today. I’m going to try to get some work done, but I’m not sure if I’m going to be able to get a lot of work done today. I’m not sure if I’m going to be able to get a lot of work done today. I’m going to try to get some work done, but I’m not sure if I’m going to be able to get a lot of work done today. I’m not sure if I’m going to be able to get a lot of work done today. I’m going to try to get some work done, but I’m not sure if I’m going to be able to get a lot of work done today. I’m not sure if I’m going to be able to get a lot of work done today. I’m going to try to get some work done, but I’m not sure if I’m going to be able to get a lot of work done today. I’m not sure if I’m going to be able to get a lot of work done today. I’m going to try to get some work done, but I’m not sure if I’m going to be able to get a lot of work done today. I’m not sure if I’m going to be able to get a lot of work done today. I’m going to try to get some work done, but I’m not sure if I’m going to be able to get a lot of work done today. I’m not sure if I’m going to be able to get a lot of work done today. I’m going to try to get some work done, but I’m not sure if I’m going to be able to get a lot of work done today. I’m not sure if I’m going to be able to get a lot of work done today. I’m going to try to get some work done, but I’m not sure if I’m going to be able to get a lot of work done today. I’m not sure if I
Okay, the user is talking about the weather being 100 degrees and how that affects their ability to work. They mention being unsure if they can get a lot of work done today, repeating the same sentence multiple times. It seems like they're feeling uncertain or maybe overwhelmed by the heat. I should acknowledge their concern about the weather affecting their productivity. Maybe offer some advice on how to cope with the heat, like staying hydrated, taking breaks, or working in a cooler environment. Also, reassure them that it's okay to not be able to do a lot of work if the weather is bad. Maybe suggest ways to /root/.cache/mlc_llm/mlc-ai/Qwen3-4B-q4f16_1-MLC/: input=934 output=128 prefill_time 1.396 sec, prefill_rate 668.9 tokens/sec, decode_time 5.104 sec, decode_rate 25.1 tokens/sec
AVERAGE OVER 3 RUNS (input_tokens=959, output_tokens=128)
/root/.cache/mlc_llm/mlc-ai/Qwen3-4B-q4f16_1-MLC/: prefill_time 1.466 sec, prefill_rate 655.9 tokens/sec, decode_time 5.114 sec, decode_rate 25.0 tokens/sec

Peak memory usage: 1728.58 MB
Saved results to: Qwen3-4B-Instruct-q4f16_1-MLC.csv

Best Regards,
richer.chan

system · June 4, 2025, 1:27am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Orin Nano Qwen3-VL-4B Jetson Orin Nano generative_ai , llm	7	487	November 3, 2025
Available with Small Language Model on tutorial Jetson Orin Nano generative_ai	3	923	May 3, 2024
Running NanoLLM Docker on Jetson Orin Nano FileNotFoundError Jetson Orin Nano generative_ai , llama	5	268	April 9, 2025
Jetson-Containers MLC Inference Jetson Orin Nano containers	3	36	November 25, 2025
MiniGPT-4 on Jetson Orin Nano 8Gb Dev kit not working Jetson Orin Nano generative_ai	9	563	May 28, 2024
Nano_LLM or nanollm for Python package? Jetson Orin Nano generative_ai , llama	8	224	May 15, 2025
Failed to MLC-compile mlc-ai/Llama-3.1-8B-Instruct-fp8-MLC on Jetson AGX orin Jetson AGX Orin generative_ai , llama-31-8b-instruct , llama	5	277	January 13, 2025
Errors on tutorial NanoVLM Jetson Orin Nano generative_ai	4	606	May 28, 2024
LLaMa 2 LLMs w/ NVIDIA Jetson and textgeneration-web-ui Jetson Projects generative_ai	86	25583	May 10, 2024
Vision part not working for phi-3.5-vision in dustynv/mlc:0.20.0-r36.4.0 container Jetson Orin Nano containers	9	115	November 11, 2025

Mlc_llm(0.19.0) does not support qwen3 model

Related topics