Skip to content

This is a repository accompanying the survey Edge AI Meets LLM (coming soon), containing a comprehensive list of papers, codebases, toolchains, and open-source frameworks. It is intended to serve as a handbook for researchers and developers interested in Edge/Mobile LLMs.

License

Notifications You must be signed in to change notification settings

yifu-ding/Awesome-Edge-LLMs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

29 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Awesome-Edge-LLMs

Awesome GitHub last commit (branch) Static Badge

A repository accompanying the survey Edge AI Meets LLM (coming soon), containing a comprehensive list of papers, codebases, toolchains, and open-source frameworks. It is intended to serve as a handbook for researchers and developers interested in Edge/Mobile LLMs.

πŸ“’ News

May 23 2025: Upload a comprehensive collection about frameworks & benchmarks, commercial products & applications, models, and add papers in frameworks section.

LLM Emergence Timeline
Figure: Timeline showcasing the evolution and emergence of Edge/Mobile LLMs, highlighting key milestones and developments in the field.

🌈 Tags Convention

πŸ”— Hyper links: Paper: badge Official website: badge Github repo: badge Huggingface link: badge

πŸ’‘ Highlights: Short name: badge Survey: badge

Contents

πŸ”¨ Deployment Frameworks

πŸ“± Commercial Products and Applications

πŸ“‘ Paper Lists

I. Open Source Frameworks and Benchmarks

I.A. End-to-End Frameworks

I.A.1. Open Source Frameworks

Framework Backend Device Support Model Family Model Size Organization
llama.cpp CUDA, HIP, SYCL
OpenCL, MUSA, Vulkan
RPC, BLAS, BLIS
CANN, Metal
CPU: x86_64, ARM
GPU: Intel, Nvidia, MTT, Adreno, AMD
NPU: Ascend
Apple Silicon
Phi, Gemma, Qwen
OpenELM, MiniCPM
GLM-edge
0.5B, 1.5B ggml
ollama
(based on llama.cpp)
CUDA,Metal CPU:x86_64, Apple-M DeepSeek-R1, Gemma
LLaMA, Phi, Mistral
LLaVA, QwQ
1B, 3B, 3.8B, 4B, 7B ollama
vLLM CUDA, HIP
SYCL, AWS Neuron
CPU: AMD, Intel, PowerPC
GPU: Nvidia, AMD, Intel
TPU
Gemma, Qwen, Phi, MiniCPM 1B, 1.2B UC Berkeley
MLC-LLM CUDA, Vulkan
OpenCL, Metal
CPU: x86_64, ARM
GPU: Nvidia
Apple Silicon
LLaMA 3B MLC
MNN-LLM HIAI, CoreML
OpenCL, CUDA
Vulkan, Metal
CPU: x86_64, ARM
GPU: Nvidia
NPU: Ascend, ANE, Apple Silicon
Qwen, Zhipu, Baichuan 0.5B, 1B, 1.5B, 2B Alibaba
PowerInfer CUDA, Metal CPU: x86_64
GPU: Nvidia
Apple Silicon
Falcon, Bamboo 7B Shanghai Jiao Tong University
ExecuTorch XNNPACK, Vulkan
ARM Ethos-U, CoreML
MediaTek, MPS
CUDA, Qualcomm AI Engine Direct SDK
CPU: ARM
GPU: Nvidia
NPU: ANE
LLaMA 1B, 3B PyTorch
MediaPipe CUDA CPU: x86_64, ARM
GPU: Nvidia
Gemma, Falcon, Phi
StableLM
1B, 2B Google
OpenPPL CUDA, CANN CPU: x86_64, ARM
GPU: Nvidia
NPU: Ascend, Hexagon, Cambricon
ChatGLM, Baichuan, InternLM 7B SenseTime
OpenVino CUDA CPU, GPU, NPU, FPGA Phi, Gemma, Qwen
MiniCPM, GLM-edge
0.5B, 1B Intel
ONNX Runtime CUDA CPU, GPU, FPGA Phi, LLaMA 1B Microsoft
mllm-NPU CUDA, QNN CPU: x86_64, ARM
GPU: Nvidia
NPU
Phi, Gemma, Qwen
MiniCPM, OpenELM
0.5B, 1B, 1.1B, 1.5B BUPT, PKU
FastLLM CUDA CPU: x86_64, ARM
GPU: Nvidia
Qwen, LLaMA 1B ServiceNow

I.A.2. Native Deployment Frameworks by Vendors

Framework Organization Core Features Links
Qualcomm AI Engine Direct SDK Qualcomm Backend: CPU(Kryo), GPU(Adreno), DSP(Hexagon)
Device: Snapdragon 8 Gen2/3/Elite
Features: Support 130+ model deployment, auto model conversion, support PyTorch/ONNX
badge
NeuroPilot MediaTek Backend: CPU, GPU, APU
Device: Dimensity series
Features: Support mainstream AI frameworks, complete toolchain, support 1B-33B parameter models
badge
MLX Apple Backend: Metal
Device: M series chips
Features: Unified memory architecture, support text/image generation, low power consumption
badge
Google AI Edge SDK Google Backend: TPU
Device: Tensor G series
Features: Fast integration of AI capabilities
badge
TensorRT-LLM NVIDIA Backend: CUDA
Device: Jetson series
Features: Dynamic batching, paged KV cache, quantization, speculative decoding
badge
OpenVINO Intel Backend: CPU, GPU, VPU
Device: Intel processors/graphics
Features: Hardware-algorithm co-optimization
badge badge

πŸ‘†πŸ»Back to Contents

I.B. Performance Benchmarks

I.B.1. General Benchmarks for Edge LLM

  • Open LLM Leaderboard for Edge Devices badge

  • Open LLM Leaderboard for Consumers badge

I.B.2. LLM Compression Benchmarks

  • LLM Compression Benchmark badge
  • LLMCBench badge badge

πŸ‘†πŸ»Back to Contents

I.C. Model Export Format

Format Safe Zero-copy Lazy loading No file size limit Layout control Flexibility Bfloat16/Fp8
GGUF (ggml-org) badge badge βœ”οΈ βœ”οΈ βœ”οΈ ~ ~ βœ”οΈ βœ”οΈ
pickle (PyTorch) badge βœ— βœ— βœ— βœ”οΈ βœ— βœ”οΈ βœ”οΈ
H5 (Tensorflow) badge βœ”οΈ βœ— βœ”οΈ βœ”οΈ ~ ~ βœ—
SavedModel (Tensorflow) badge βœ”οΈ βœ— βœ— βœ”οΈ βœ”οΈ βœ— βœ”οΈ
MsgPack (flax) badge βœ”οΈ βœ”οΈ βœ— βœ”οΈ βœ— βœ— βœ”οΈ
Protobuf (ONNX) badge badge βœ”οΈ βœ— βœ— βœ— βœ— βœ— βœ”οΈ
Cap'n'Proto badge badge βœ”οΈ βœ”οΈ ~ βœ”οΈ βœ”οΈ ~ βœ—
llamafile (Mozilla) badge βœ”οΈ βœ— βœ— βœ— ~ ~ βœ”οΈ
Numpy (npy,npz) badge badge βœ”οΈ ? ? βœ— βœ”οΈ βœ— βœ—
pdparams (Paddle) badge βœ— βœ— βœ— βœ”οΈ βœ— βœ”οΈ βœ”οΈ
SafeTensors badge badge βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ βœ— βœ”οΈ

πŸ’‘ Note: This table is taken from safetensors git repository, and more detailed information can be found there.

model export format
Figure: File format illustrations reference safetensors and GGUF.

πŸ‘†πŸ»Back to Contents

Paper Lists

IV. Algorithms (TBC)

V. Frameworks

VI. Hardware (TBC)

πŸ“„ License

This project is open-source and available under the MIT License. See the LICENSE file for more details.

πŸ‘†πŸ»Back to Contents

About

This is a repository accompanying the survey Edge AI Meets LLM (coming soon), containing a comprehensive list of papers, codebases, toolchains, and open-source frameworks. It is intended to serve as a handbook for researchers and developers interested in Edge/Mobile LLMs.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •