LLaMA-O1: Open Large Reasoning Model Frameworks For Training, Inference and Evaluation With PyTorch and HuggingFace
Towards Open-Source Large Reasoning Models
The first version of LLaMA-O1 has been uploaded to HF now! Here We Come!
Supervised:
Base(Pretrain):
Supervised Finetune Dataset:
Pretraining Dataset:
RLHF is on the way! View our GitHub Repo:
Our ongoing related researches:
Online Demo (CPU-only): https://huggingface.co/spaces/SimpleBerry/LLaMA-O1-Supervised-1129-Demo
- Marked Language of Long CoT (Done)
- Pretrain Dataset (Done)
- Supervised Dataset (Done)
- PRM token rectifcation Dataset (Done)
- Reinforcement Learning With Self-Play (Codes done, training)
- Inference-time Reasoning Enhancement Frameworks (Codes done, Temporarily postponed)