Can I Run AI locally?

Meta · 8B · Llama 3.1 Community

Meta's versatile 8B — great quality/speed ratio

4.6 GB · 128K ctx ·

Qwen 3.5 9B

Alibaba · 9B · Apache 2.0

Multimodal Qwen 3.5 mid-size

5.1 GB · 32K ctx ·

Phi-4 14B

Microsoft · 14B · MIT

Microsoft's reasoning-focused model

7.7 GB · 16K ctx ·

GPT-OSS 20B

7mo ago

OpenAI · 21B · Apache 2.0

OpenAI's open-weight MoE with configurable reasoning

11.3 GB · 128K ctx ·

Mistral Small 3.1 24B

Mistral AI · 24B · Apache 2.0

Multimodal Mistral with vision support

12.8 GB · 128K ctx ·

Gemma 3 27B

Google · 27B · Gemma

Google's flagship Gemma 3 model

14.3 GB · 128K ctx ·

Qwen 2.5 Coder 32B

Alibaba · 32B · Apache 2.0

Best open-source coding model at release

Qwen 3 32B

Alibaba · 32B · Apache 2.0

Qwen 3 flagship dense model

DeepSeek R1 Distill 32B

DeepSeek · 32B · MIT

R1 reasoning distilled into Qwen 32B — sweet spot

16.9 GB · 64K ctx ·

Llama 3.3 70B

Llama 3.3 Community

Meta · 70B · Llama 3.3 Community

Best open model at 70B class

36.4 GB · 128K ctx ·

Llama 4 Scout 17B

Llama 4 Community

Meta · 109B · Llama 4 Community

MoE with 16 experts, 17B active params

56.3 GB · 128K ctx ·

GPT-OSS 120B

7mo ago

OpenAI · 117B · Apache 2.0

OpenAI's flagship open-weight MoE — 52.6% SWE-bench

60.4 GB · 128K ctx ·

Devstral 2 123B

MRL

3mo ago

Mistral AI · 123B · MRL

Dense 123B coding model — 72.2% SWE-bench Verified

63.5 GB · 256K ctx ·

DeepSeek R1

DeepSeek · 671B · MIT

Massive MoE reasoning model — 37B active

344.2 GB · 64K ctx ·

DeepSeek V3.2

3mo ago

DeepSeek · 685B · MIT

State-of-the-art MoE — 37B active params

351.4 GB · 128K ctx ·

Kimi K2

Kimi

Moonshot AI · 1T · Kimi

1T-param MoE with 384 experts — 32B active, strong agentic coding

512.7 GB · 128K ctx ·

All models

Qwen 3.5 0.8B

Alibaba · 0.8B · Apache 2.0

Ultra-tiny model for embedded and edge

0.9 GB · 32K ctx ·

Llama 3.2 1B

Llama 3.2 Community

Meta · 1B · Llama 3.2 Community

Meta's smallest Llama for edge devices

1 GB · 128K ctx ·

Gemma 3 1B

Google · 1B · Gemma

Google's tiny Gemma for on-device

1 GB · 32K ctx ·

TinyLlama 1.1B

Community · 1.1B · Apache 2.0

Ultralight model for constrained devices

1.1 GB · 2K ctx ·

Qwen 2.5 Coder 1.5B

Alibaba · 1.5B · Apache 2.0

Ultra-lightweight coding model

1.3 GB · 32K ctx ·

DeepSeek R1 1.5B

DeepSeek · 1.5B · MIT

Tiny reasoning model distilled from R1

1.3 GB · 64K ctx ·

Qwen 3 1.7B

Alibaba · 1.7B · Apache 2.0

Compact multilingual Qwen 3

1.4 GB · 32K ctx ·

Qwen 3.5 2B

Alibaba · 2B · Apache 2.0

Small multimodal Qwen 3.5

1.5 GB · 32K ctx ·

Gemma 2 2B

Google · 2B · Gemma

Google's compact open model

1.5 GB · 8K ctx ·

Llama 3.2 3B

Llama 3.2 Community

Meta · 3B · Llama 3.2 Community

Lightweight Llama for mobile and edge

2 GB · 128K ctx ·

SmolLM3 3B

HuggingFace · 3B · Apache 2.0

Lightweight multilingual reasoning

2 GB · 128K ctx ·

Phi-3.5 Mini

Microsoft · 3.8B · MIT

Microsoft's efficient small model with long context

2.4 GB · 128K ctx ·

Phi-4 Mini Reasoning

Microsoft · 3.8B · MIT

Lightweight reasoning model

2.4 GB · 16K ctx ·

Qwen 3 4B

Alibaba · 4B · Apache 2.0

Compact Qwen 3 for general tasks

2.5 GB · 32K ctx ·

Gemma 3 4B

Google · 4B · Gemma

Multimodal Gemma with 128K context

2.5 GB · 128K ctx ·

Qwen 3.5 4B

Alibaba · 4B · Apache 2.0

Small multimodal Qwen 3.5

2.5 GB · 32K ctx ·

Mistral 7B v0.3

Mistral AI · 7B · Apache 2.0

High-quality 7B with sliding window attention

4.1 GB · 32K ctx ·

Qwen 2.5 7B

Alibaba · 7B · Apache 2.0

Strong multilingual and coding capabilities

4.1 GB · 128K ctx ·

Qwen 2.5 Coder 7B

Alibaba · 7B · Apache 2.0

Dedicated coding model

4.1 GB · 128K ctx ·

DeepSeek R1 Distill 7B

DeepSeek · 7B · MIT

R1 reasoning distilled into Qwen 7B

4.1 GB · 64K ctx ·

Qwen 3 8B

Alibaba · 8B · Apache 2.0

Qwen 3 with thinking mode support

4.6 GB · 128K ctx ·

Ministral 8B

MRL

Mistral AI · 8B · MRL

Mistral's efficient 8B model

4.6 GB · 32K ctx ·

Gemma 2 9B

Google · 9B · Gemma

Google's best mid-size open model

5.1 GB · 8K ctx ·

GLM-4 9B

GLM-4

Zhipu AI · 9B · GLM-4

Multilingual model supporting 26 languages with 128K context

5.1 GB · 128K ctx ·

Nemotron Nano 9B v2

NVIDIA Open

9mo ago

NVIDIA · 9B · NVIDIA Open

Hybrid Mamba2 architecture for reasoning

5.1 GB · 128K ctx ·

Llama 3.2 11B Vision

Llama 3.2 Community

Meta · 11B · Llama 3.2 Community

Multimodal vision and text model

6.1 GB · 128K ctx ·

Gemma 3 12B

Google · 12B · Gemma

Multimodal Gemma with 128K context

6.6 GB · 128K ctx ·

Mistral Nemo 12B

Mistral AI · 12B · Apache 2.0

Multilingual 12B with 128K context

6.6 GB · 128K ctx ·

Qwen 2.5 14B

Alibaba · 14B · Apache 2.0

Excellent quality for its size class

7.7 GB · 128K ctx ·

Qwen 3 14B

Alibaba · 14B · Apache 2.0

Strong all-rounder with thinking mode

7.7 GB · 128K ctx ·

DeepSeek R1 Distill 14B

DeepSeek · 14B · MIT

R1 reasoning distilled into Qwen 14B

7.7 GB · 64K ctx ·

LFM2 24B

Liquid AI

4mo ago

Liquid AI · 24B · Liquid AI

Hybrid MoE with convolution+attention layers — 2.3B active

12.8 GB · 32K ctx ·

Devstral Small 2 24B

3mo ago

Mistral AI · 24B · Apache 2.0

Coding-focused model with 256K context — 68% SWE-bench

12.8 GB · 256K ctx ·

Gemma 2 27B

Google · 27B · Gemma

Google's largest Gemma 2 model

14.3 GB · 8K ctx ·

Qwen 3.5 27B

Alibaba · 27.8B · Apache 2.0

Flagship native multimodal Qwen 3.5

14.7 GB · 256K ctx ·

Qwen 3 30B-A3B

Alibaba · 30B · Apache 2.0

MoE with only 3.3B active — extremely efficient

15.9 GB · 128K ctx ·

Nemotron 3 Nano 30B

NVIDIA Open

9mo ago

NVIDIA · 30B · NVIDIA Open

MoE with 1M context and 3B active

15.9 GB · 1024K ctx ·

Qwen 2.5 32B

Alibaba · 32B · Apache 2.0

High-quality reasoning and multilingual

EXAONE 4.0 32B

EXAONE AI

LG AI · 32B · EXAONE AI

Hybrid reasoning, multilingual

OLMo 2 32B

Allen AI · 32B · Apache 2.0

Fully open research model by Allen AI

16.9 GB · 4K ctx ·

Command R 35B

CC BY-NC 4.0

Cohere · 35B · CC BY-NC 4.0

Optimized for retrieval-augmented generation

18.4 GB · 128K ctx ·

Qwen 3.5 35B-A3B

Alibaba · 35B · Apache 2.0

Efficient multimodal MoE with 3B active

18.4 GB · 256K ctx ·

Mixtral 8x7B

Mistral AI · 47B · Apache 2.0

MoE with 12.9B active params

24.6 GB · 32K ctx ·

Qwen 2.5 72B

Qwen

Alibaba · 72B · Qwen

Alibaba's flagship open model

37.4 GB · 128K ctx ·

Qwen 3.5 122B-A10B

Alibaba · 122B · Apache 2.0

Large multimodal MoE with 10B active

63 GB · 256K ctx ·

Mixtral 8x22B

Mistral AI · 141B · Apache 2.0

Large MoE with 39B active params

72.7 GB · 64K ctx ·

Qwen 3 235B-A22B

Alibaba · 235B · Apache 2.0

Massive MoE with 22B active — frontier quality

120.9 GB · 128K ctx ·

Qwen 3.5 397B-A17B

Alibaba · 397B · Apache 2.0

Largest multimodal Qwen 3.5 MoE

203.9 GB · 256K ctx ·

Llama 4 Maverick 17B-128E

Llama 4 Community

Meta · 400B · Llama 4 Community

Multimodal MoE with 128 experts — 17B active, 1M context

205.4 GB · 1024K ctx ·

Llama 3.1 405B

Llama 3.1 Community

Meta · 405B · Llama 3.1 Community

Largest open-weight dense model by Meta

208 GB · 128K ctx ·

Qwen 3 Coder 480B

Alibaba · 480B · Apache 2.0

Largest open coding MoE — 35B active

246.4 GB · 256K ctx ·

DeepSeek V3.1