A curated collection of Hebrew language AI models available on Hugging Face as of April 17, 2025. This repository aims to provide a good starting point for anyone looking to work with Hebrew language AI models.
Hebrew is a Semitic language with approximately 9 million native speakers worldwide, primarily in Israel. Despite its relatively small speaker base, Hebrew presents several interesting characteristics for AI research:
-
Modern vs Biblical Hebrew: There are significant differences between Modern Hebrew and Biblical Hebrew, with specialized models developed for biblical text analysis.
-
Punctuation Challenges: Modern written Hebrew typically lacks extensive punctuation, creating a need for specialized models that can infer and add appropriate punctuation.
-
Technological Hub: Israel is a renowned center for technology and AI research, making Hebrew language AI models particularly interesting from an experimental and innovation perspective.
-
Rich Linguistic Structure: Hebrew's non-Latin script, right-to-left writing system, and complex morphology present unique challenges for language models.
These factors make Hebrew language AI development both challenging and valuable, with applications ranging from biblical text analysis to modern NLP tasks.
- Hebrew-LLMs
- About Hebrew Language and AI
- Table of Contents
- Large Language Models (LLMs)
- Niche Text Models
- Specialized Language Models
- ASR Models (Speech Recognition)
- TTS Models (Text-to-Speech)
- Benchmarks and Leaderboards
- Organizations to Follow
- Other Interesting Projects
- Additional Links
- Worthy Follow
- Reading
- Resources
| Model | Link |
|---|---|
| Hebrew-Mistral-7B | |
| Hebrew-Mistral-7B-200K | |
| Hebrew-Mistral-7B_Chat-GGUF | |
| Hebrew-Mistral-7B-Instruct-v0.1-GGUF |
| Model | Link |
|---|---|
| Hebrew-Mixtral-8x22B |
| Model | Link |
|---|---|
| Hebrew-Gemma-11B | |
| Hebrew-Gemma-11B-Instruct | |
| Hebrew-Gemma-11B-V2-mlx-4bit |
Note: This section will be populated with Hebrew TTS models in the future.
The Hebrew LLM Leaderboard provides valuable insights into the performance of various models on Hebrew language tasks:
| Resource | Link |
|---|---|
| Hebrew LLM Leaderboard | |
| Hebrew Question Answering Dataset |
An interesting observation from the leaderboard is that large multilingual LLMs (like Mistral and Meta-Llama models) generally outperform specialized Hebrew models due to their significantly larger parameter counts. However, specialized Hebrew models still appear on the leaderboard and perform reasonably well considering their size constraints.
The benchmark evaluates models across several categories:
- SNLI (Natural Language Inference)
- QA (Question Answering)
- TLNLS (Text Classification)
- Sentiment Analysis
- Winograd Schema Challenge
- Translation
- Israeli Trivia (a unique category testing cultural and local knowledge)
This comprehensive evaluation provides a holistic view of model capabilities in the Hebrew language context.
| Organization | Link |
|---|---|
| Dicta | |
| MAFAT (National Natural Language Processing Plan Of Israel) |
| Resource | Link |
|---|---|
| Best LLM for Hebrew Classification | |
| Hebrew LLM Paper | |
| Hebrew Model Sentiment Analysis | |
| Huggingface Hebrew Leaderboard | |
| Hebrew GPT Neo XL |