Hi ,
I want to run the Qwen3-4B model on my orin Nano Super8G device, I download the mlc-ai/Qwen3-4B-q4f16_1-MLC from the huggingface, but I count the following issue when I run mlc_llm, it seemed that current mlc_llm doesn’t support qwen3
python3 benchmark.py --model /root/.cache/mlc_llm/mlc-ai/Qwen3-4B-q4f16_1-MLC/ --max-num-prompts 4 --prompt ~/.cache/mlc_llm/jetson-containers-master/data/prompts/completion_1024.json --prefill-chunk-size 1024 --save Qwen3-4B-Instruct-MLC.csv
TVM version: 0.19.0
MLC version: 0.19.0USE_NVTX: OFF
USE_GTEST: OFF
SUMMARIZE: ON
TVM_DEBUG_WITH_ABI_CHANGE: OFF
USE_IOS_RPC: OFF
USE_MSC: OFF
CUDA_VERSION: 12.6
USE_LIBBACKTRACE: OFF
DLPACK_PATH: 3rdparty/dlpack/include
USE_TENSORRT_CODEGEN: OFF
USE_OPENCL_EXTN_QCOM: NOT-FOUND
USE_TARGET_ONNX: OFF
USE_AOT_EXECUTOR: OFF
BUILD_DUMMY_LIBTVM: OFF
USE_CUDNN: ON
USE_TENSORRT_RUNTIME: OFF
USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF
USE_THRUST: ON
USE_CCACHE: ON
USE_ARM_COMPUTE_LIB: OFF
USE_CPP_RTVM:
USE_OPENCL_GTEST: /path/to/opencl/gtest
TVM_LOG_BEFORE_THROW: OFF
USE_MKL: OFF
USE_PT_TVMDSOOP: OFF
MLIR_VERSION: NOT-FOUND
USE_CLML: OFF
USE_STACKVM_RUNTIME: ON
USE_GRAPH_EXECUTOR_CUDA_GRAPH: ON
ROCM_PATH: /opt/rocm
USE_DNNL: OFF
USE_MSCCL: OFF
USE_NNAPI_RUNTIME: OFF
USE_VITIS_AI: OFF
USE_MLIR: OFF
USE_RCCL: OFF
USE_LLVM: /usr/bin/llvm-config --link-static
USE_VERILATOR: OFF
USE_TF_TVMDSOOP: OFF
USE_THREADS: ON
USE_MSVC_MT: OFF
BACKTRACE_ON_SEGFAULT: OFF
USE_GRAPH_EXECUTOR: ON
USE_NCCL: OFF
USE_ROCBLAS: OFF
GIT_COMMIT_HASH: 7ed4584952546fa5d54366b72a6862f919c18daa
USE_VULKAN: OFF
USE_RUST_EXT: OFF
USE_CUTLASS: ON
USE_CPP_RPC: OFF
USE_HEXAGON: OFF
USE_CUSTOM_LOGGING: OFF
USE_UMA: OFF
USE_FALLBACK_STL_MAP: OFF
USE_SORT: ON
USE_RTTI: ON
GIT_COMMIT_TIME: 2024-12-15 09:56:40 -0500
USE_HIPBLAS: OFF
USE_HEXAGON_SDK: /path/to/sdk
USE_BLAS: none
USE_LIBTORCH: OFF
USE_RANDOM: ON
USE_CUDA: ON
USE_COREML: OFF
USE_AMX: OFF
BUILD_STATIC_RUNTIME: OFF
USE_KHRONOS_SPIRV: OFF
USE_CLML_GRAPH_EXECUTOR: OFF
USE_TFLITE: OFF
USE_HEXAGON_GTEST: /path/to/hexagon/gtest
PICOJSON_PATH: 3rdparty/picojson
USE_OPENCL_ENABLE_HOST_PTR: OFF
INSTALL_DEV: OFF
USE_PROFILER: OFF
USE_NNPACK: OFF
LLVM_VERSION: 17.0.6
USE_MRVL: OFF
USE_OPENCL: OFF
COMPILER_RT_PATH: 3rdparty/compiler-rt
USE_NNAPI_CODEGEN: OFF
RANG_PATH: 3rdparty/rang/include
USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF
USE_OPENMP: OFF
USE_BNNS: OFF
USE_FLASHINFER:
USE_CUBLAS: ON
USE_METAL: OFF
USE_HEXAGON_EXTERNAL_LIBS: OFF
USE_ALTERNATIVE_LINKER: AUTO
USE_BYODT_POSIT: OFF
USE_NVSHMEM: OFF
USE_HEXAGON_RPC: OFF
DMLC_PATH: 3rdparty/dmlc-core/include
INDEX_DEFAULT_I64: ON
USE_RELAY_DEBUG: OFF
USE_RPC: OFF
USE_TENSORFLOW_PATH: none
TVM_CLML_VERSION:
USE_MIOPEN: OFF
USE_ROCM: OFF
USE_PAPI: OFF
USE_CURAND: ON
TVM_CXX_COMPILER_PATH: /usr/bin/c++
HIDE_PRIVATE_SYMBOLS: ON
Namespace(model=‘/root/.cache/mlc_llm/mlc-ai/Qwen3-4B-q4f16_1-MLC/’, model_lib_path=None, prompt=[‘/root/.cache/mlc_llm/jetson-containers-master/data/prompts/completion_1024.json’], chat=False, streaming=False, max_new_tokens=128, max_num_prompts=4, max_context_len=None, prefill_chunk_size=1024, save=‘Qwen3-4B-Instruct-MLC.csv’)
– loading /root/.cache/mlc_llm/mlc-ai/Qwen3-4B-q4f16_1-MLC/
Traceback (most recent call last):
File “/root/.cache/mlc_llm/jetson-containers-master/packages/llm/mlc/benchmark.py”, line 145, in
model = MLCEngine(args.model, model_lib=args.model_lib_path, mode=‘interactive’, engine_config=cfg)
File “/usr/local/lib/python3.10/dist-packages/mlc_llm/serve/engine.py”, line 1466, in init
super().init(
File “/usr/local/lib/python3.10/dist-packages/mlc_llm/serve/engine_base.py”, line 590, in init
) = _process_model_args(models, device, engine_config)
File “/usr/local/lib/python3.10/dist-packages/mlc_llm/serve/engine_base.py”, line 171, in _process_model_args
model_args: List[Tuple[str, str]] = [_convert_model_info(model) for model in models]
File “/usr/local/lib/python3.10/dist-packages/mlc_llm/serve/engine_base.py”, line 171, in
model_args: List[Tuple[str, str]] = [_convert_model_info(model) for model in models]
File “/usr/local/lib/python3.10/dist-packages/mlc_llm/serve/engine_base.py”, line 164, in _convert_model_info
model_lib = jit.jit(
File “/usr/local/lib/python3.10/dist-packages/mlc_llm/interface/jit.py”, line 129, in jit
“model_config”: _get_model_config(),
File “/usr/local/lib/python3.10/dist-packages/mlc_llm/interface/jit.py”, line 96, in _get_model_config
return MODELS[model_type].config.from_dict(model_config).asdict()
KeyError: ‘qwen3’
I try to update the mlc_llm but I found that mlc-ai-nightly doesn’t support cuda12.6 version. The following web page show it: mlc.ai/wheels
Install MLC LLM Python Package — mlc-llm 0.1.0 documentation
Could you please give me some advise to run the Qwen3-4B model?
Thank you very much for your assistance!
Best regards,
richer.chan