A repository accompanying the survey Edge AI Meets LLM (coming soon), containing a comprehensive list of papers, codebases, toolchains, and open-source frameworks. It is intended to serve as a handbook for researchers and developers interested in Edge/Mobile LLMs.
May 23 2025: Upload a comprehensive collection about frameworks & benchmarks, commercial products & applications, models, and add papers in frameworks section.
Figure: Timeline showcasing the evolution and emergence of Edge/Mobile LLMs, highlighting key milestones and developments in the field.
π Hyper links:
Paper:
Official website:
Github repo:
Huggingface link:
π‘ Highlights: Short name:
Survey:
π¨ Deployment Frameworks
π± Commercial Products and Applications
| Framework | Backend | Device Support | Model Family | Model Size | Organization |
|---|---|---|---|---|---|
| llama.cpp | CUDA, HIP, SYCL OpenCL, MUSA, Vulkan RPC, BLAS, BLIS CANN, Metal |
CPU: x86_64, ARM GPU: Intel, Nvidia, MTT, Adreno, AMD NPU: Ascend Apple Silicon |
Phi, Gemma, Qwen OpenELM, MiniCPM GLM-edge |
0.5B, 1.5B | ggml |
| ollama (based on llama.cpp) |
CUDA,Metal | CPU:x86_64, Apple-M | DeepSeek-R1, Gemma LLaMA, Phi, Mistral LLaVA, QwQ |
1B, 3B, 3.8B, 4B, 7B | ollama |
| vLLM | CUDA, HIP SYCL, AWS Neuron |
CPU: AMD, Intel, PowerPC GPU: Nvidia, AMD, Intel TPU |
Gemma, Qwen, Phi, MiniCPM | 1B, 1.2B | UC Berkeley |
| MLC-LLM | CUDA, Vulkan OpenCL, Metal |
CPU: x86_64, ARM GPU: Nvidia Apple Silicon |
LLaMA | 3B | MLC |
| MNN-LLM | HIAI, CoreML OpenCL, CUDA Vulkan, Metal |
CPU: x86_64, ARM GPU: Nvidia NPU: Ascend, ANE, Apple Silicon |
Qwen, Zhipu, Baichuan | 0.5B, 1B, 1.5B, 2B | Alibaba |
| PowerInfer | CUDA, Metal | CPU: x86_64 GPU: Nvidia Apple Silicon |
Falcon, Bamboo | 7B | Shanghai Jiao Tong University |
| ExecuTorch | XNNPACK, Vulkan ARM Ethos-U, CoreML MediaTek, MPS CUDA, Qualcomm AI Engine Direct SDK |
CPU: ARM GPU: Nvidia NPU: ANE |
LLaMA | 1B, 3B | PyTorch |
| MediaPipe | CUDA | CPU: x86_64, ARM GPU: Nvidia |
Gemma, Falcon, Phi StableLM |
1B, 2B | |
| OpenPPL | CUDA, CANN | CPU: x86_64, ARM GPU: Nvidia NPU: Ascend, Hexagon, Cambricon |
ChatGLM, Baichuan, InternLM | 7B | SenseTime |
| OpenVino | CUDA | CPU, GPU, NPU, FPGA | Phi, Gemma, Qwen MiniCPM, GLM-edge |
0.5B, 1B | Intel |
| ONNX Runtime | CUDA | CPU, GPU, FPGA | Phi, LLaMA | 1B | Microsoft |
| mllm-NPU | CUDA, QNN | CPU: x86_64, ARM GPU: Nvidia NPU |
Phi, Gemma, Qwen MiniCPM, OpenELM |
0.5B, 1B, 1.1B, 1.5B | BUPT, PKU |
| FastLLM | CUDA | CPU: x86_64, ARM GPU: Nvidia |
Qwen, LLaMA | 1B | ServiceNow |
| Framework | Organization | Core Features | Links |
|---|---|---|---|
| Qualcomm AI Engine Direct SDK | Qualcomm | Backend: CPU(Kryo), GPU(Adreno), DSP(Hexagon) Device: Snapdragon 8 Gen2/3/Elite Features: Support 130+ model deployment, auto model conversion, support PyTorch/ONNX |
|
| NeuroPilot | MediaTek | Backend: CPU, GPU, APU Device: Dimensity series Features: Support mainstream AI frameworks, complete toolchain, support 1B-33B parameter models |
|
| MLX | Apple | Backend: Metal Device: M series chips Features: Unified memory architecture, support text/image generation, low power consumption |
|
| Google AI Edge SDK | Backend: TPU Device: Tensor G series Features: Fast integration of AI capabilities |
|
|
| TensorRT-LLM | NVIDIA | Backend: CUDA Device: Jetson series Features: Dynamic batching, paged KV cache, quantization, speculative decoding |
|
| OpenVINO | Intel | Backend: CPU, GPU, VPU Device: Intel processors/graphics Features: Hardware-algorithm co-optimization |
|
π‘ Note: This table is taken from safetensors git repository, and more detailed information can be found there.
Figure: File format illustrations reference safetensors and GGUF.
- V.A. High-Speed Computation Kernels
- V.B. Graph Optimization
- V.C. Memory Optimization
- V.D. Pipeline Optimization
- V.E. Multi-device Collaboration
- V.F. Cloud-Edge Collaboration
This project is open-source and available under the MIT License. See the LICENSE file for more details.