Awesome-Edge-LLMs

A repository accompanying the survey Edge AI Meets LLM (coming soon), containing a comprehensive list of papers, codebases, toolchains, and open-source frameworks. It is intended to serve as a handbook for researchers and developers interested in Edge/Mobile LLMs.

📢 News

May 23 2025: Upload a comprehensive collection about frameworks & benchmarks, commercial products & applications, models, and add papers in frameworks section.

Figure: Timeline showcasing the evolution and emergence of Edge/Mobile LLMs, highlighting key milestones and developments in the field.

🌈 Tags Convention

🔗 Hyper links: Paper: Official website: Github repo: Huggingface link:

💡 Highlights: Short name: Survey:

I. Open Source Frameworks and Benchmarks

I.A. End-to-End Frameworks

I.A.1. Open Source Frameworks

Framework	Backend	Device Support	Model Family	Model Size	Organization
llama.cpp	CUDA, HIP, SYCL OpenCL, MUSA, Vulkan RPC, BLAS, BLIS CANN, Metal	CPU: x86_64, ARM GPU: Intel, Nvidia, MTT, Adreno, AMD NPU: Ascend Apple Silicon	Phi, Gemma, Qwen OpenELM, MiniCPM GLM-edge	0.5B, 1.5B	ggml
ollama (based on llama.cpp)	CUDA,Metal	CPU:x86_64, Apple-M	DeepSeek-R1, Gemma LLaMA, Phi, Mistral LLaVA, QwQ	1B, 3B, 3.8B, 4B, 7B	ollama
vLLM	CUDA, HIP SYCL, AWS Neuron	CPU: AMD, Intel, PowerPC GPU: Nvidia, AMD, Intel TPU	Gemma, Qwen, Phi, MiniCPM	1B, 1.2B	UC Berkeley
MLC-LLM	CUDA, Vulkan OpenCL, Metal	CPU: x86_64, ARM GPU: Nvidia Apple Silicon	LLaMA	3B	MLC
MNN-LLM	HIAI, CoreML OpenCL, CUDA Vulkan, Metal	CPU: x86_64, ARM GPU: Nvidia NPU: Ascend, ANE, Apple Silicon	Qwen, Zhipu, Baichuan	0.5B, 1B, 1.5B, 2B	Alibaba
PowerInfer	CUDA, Metal	CPU: x86_64 GPU: Nvidia Apple Silicon	Falcon, Bamboo	7B	Shanghai Jiao Tong University
ExecuTorch	XNNPACK, Vulkan ARM Ethos-U, CoreML MediaTek, MPS CUDA, Qualcomm AI Engine Direct SDK	CPU: ARM GPU: Nvidia NPU: ANE	LLaMA	1B, 3B	PyTorch
MediaPipe	CUDA	CPU: x86_64, ARM GPU: Nvidia	Gemma, Falcon, Phi StableLM	1B, 2B	Google
OpenPPL	CUDA, CANN	CPU: x86_64, ARM GPU: Nvidia NPU: Ascend, Hexagon, Cambricon	ChatGLM, Baichuan, InternLM	7B	SenseTime
OpenVino	CUDA	CPU, GPU, NPU, FPGA	Phi, Gemma, Qwen MiniCPM, GLM-edge	0.5B, 1B	Intel
ONNX Runtime	CUDA	CPU, GPU, FPGA	Phi, LLaMA	1B	Microsoft
mllm-NPU	CUDA, QNN	CPU: x86_64, ARM GPU: Nvidia NPU	Phi, Gemma, Qwen MiniCPM, OpenELM	0.5B, 1B, 1.1B, 1.5B	BUPT, PKU
FastLLM	CUDA	CPU: x86_64, ARM GPU: Nvidia	Qwen, LLaMA	1B	ServiceNow

I.A.2. Native Deployment Frameworks by Vendors

Framework	Organization	Core Features
Qualcomm AI Engine Direct SDK	Qualcomm	Backend: CPU(Kryo), GPU(Adreno), DSP(Hexagon) Device: Snapdragon 8 Gen2/3/Elite Features: Support 130+ model deployment, auto model conversion, support PyTorch/ONNX
NeuroPilot	MediaTek	Backend: CPU, GPU, APU Device: Dimensity series Features: Support mainstream AI frameworks, complete toolchain, support 1B-33B parameter models
MLX	Apple	Backend: Metal Device: M series chips Features: Unified memory architecture, support text/image generation, low power consumption
Google AI Edge SDK	Google	Backend: TPU Device: Tensor G series Features: Fast integration of AI capabilities
TensorRT-LLM	NVIDIA	Backend: CUDA Device: Jetson series Features: Dynamic batching, paged KV cache, quantization, speculative decoding
OpenVINO	Intel	Backend: CPU, GPU, VPU Device: Intel processors/graphics Features: Hardware-algorithm co-optimization

👆🏻Back to Contents

I.B. Performance Benchmarks

I.B.1. General Benchmarks for Edge LLM

Open LLM Leaderboard for Edge Devices
Open LLM Leaderboard for Consumers

I.B.2. LLM Compression Benchmarks

LLM Compression Benchmark
LLMCBench

👆🏻Back to Contents

I.C. Model Export Format

Format	Safe	Zero-copy	Lazy loading	No file size limit	Layout control	Flexibility	Bfloat16/Fp8
GGUF (ggml-org)	✔️	✔️	✔️	~	~	✔️	✔️
pickle (PyTorch)	✗	✗	✗	✔️	✗	✔️	✔️
H5 (Tensorflow)	✔️	✗	✔️	✔️	~	~	✗
SavedModel (Tensorflow)	✔️	✗	✗	✔️	✔️	✗	✔️
MsgPack (flax)	✔️	✔️	✗	✔️	✗	✗	✔️
Protobuf (ONNX)	✔️	✗	✗	✗	✗	✗	✔️
Cap'n'Proto	✔️	✔️	~	✔️	✔️	~	✗
llamafile (Mozilla)	✔️	✗	✗	✗	~	~	✔️
Numpy (npy,npz)	✔️	?	?	✗	✔️	✗	✗
pdparams (Paddle)	✗	✗	✗	✔️	✗	✔️	✔️
SafeTensors	✔️	✔️	✔️	✔️	✔️	✗	✔️

💡 Note: This table is taken from safetensors git repository, and more detailed information can be found there.

Figure: File format illustrations reference safetensors and GGUF.

👆🏻Back to Contents

Paper Lists

IV. Algorithms (TBC)

V. Frameworks

VI. Hardware (TBC)

📄 License

This project is open-source and available under the MIT License. See the LICENSE file for more details.

👆🏻Back to Contents

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
docs		docs
images		images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Awesome-Edge-LLMs

📢 News

🌈 Tags Convention

Contents

I. Open Source Frameworks and Benchmarks

I.A. End-to-End Frameworks

I.A.1. Open Source Frameworks

I.A.2. Native Deployment Frameworks by Vendors

I.B. Performance Benchmarks

I.B.1. General Benchmarks for Edge LLM

I.B.2. LLM Compression Benchmarks

I.C. Model Export Format

Paper Lists

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

License

yifu-ding/Awesome-Edge-LLMs

Folders and files

Latest commit

History

Repository files navigation

Awesome-Edge-LLMs

📢 News

🌈 Tags Convention

Contents

I. Open Source Frameworks and Benchmarks

I.A. End-to-End Frameworks

I.A.1. Open Source Frameworks

I.A.2. Native Deployment Frameworks by Vendors

I.B. Performance Benchmarks

I.B.1. General Benchmarks for Edge LLM

I.B.2. LLM Compression Benchmarks

I.C. Model Export Format

Paper Lists

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Packages