English | 简体中文
- 🔥 2026-1-10: We upgraded OpenDataArena-scored-data, a collection of over 47 original datasets scored by OpenDataArena-Tool.
- 🔥 2026-1-3: We upgraded OpenDataArena-Tool for multimodal data training and evaluation, see VLM Model Training and VLM Benchmark Evaluation for details about how to train and eval VLMs.
- 🔥 2025-12-22: We upgraded OpenDataArena with Qwen3-VL for multimodal data value assessment and 80+ scoring dimensions.
- 🔥 2025-12-17: We released our OpenDataArena Technical Report.
- 2025-07-26: We released the OpenDataArena platform and the OpenDataArena-Tool repository.
OpenDataArena (ODA) is an open, transparent, and extensible platform designed to transform dataset value assessment from guesswork to science. In the era of large language models (LLMs), data is the critical fuel driving model performance — yet its value has long remained a "black box". ODA aims to make every post-training dataset measurable, comparable, and verifiable, enabling researchers to understand what data truly matters.
ODA introduces an open "data arena" where datasets compete under equal training and evaluation conditions, allowing their contribution to downstream model performance to be measured objectively.
Key features of the platform include:
- ODA Leaderboard The core philosophy of ODA is that data value must be verified through real-world training. By establishing a standardized "proving ground," ODA moves beyond subjective quality assessment to empirical performance tracking.
- Unified Benchmarking: Evaluates post-training data across multiple domains (General, Math, Code, Science, and Long-Chain Reasoning) and multiple modalities (Text, Image).
- Standardized Environments: Controls for variables by using fixed model scales (Llama3 / Qwen2 / Qwen3 / Qwen3-VL 7-8B) and consistent training configurations.

- Data Lineage Analysis Modern datasets often suffer from high redundancy and hidden dependencies. ODA introduces the industry’s first Data Lineage Analysis tool to visualize the "genealogy" of open-source data.
- Structural Modeling: Maps relationships including inheritance, mixing, and distillation between datasets.
- Visual Discovery: Provides a "family tree" view to identify core data sources that are repeatedly reused across the community.
- Contamination Detection: Helps researchers pinpoint potential train-test contamination and "inbreeding" issues, offering a structural explanation for why certain datasets consistently dominate leaderboards.

- Multi-dimensional Data Scoring Beyond downstream performance, ODA provides a "physical examination" of the data itself. We offer a fine-grained scoring framework that analyzes the intrinsic properties of data samples.
- Diverse Methodology: Combines model-based evaluation, LLM-as-a-Judge, and heuristic metrics to assess instruction complexity, response quality, and diversity.
- Massive Open-Source Insights: We have open-sourced scores for over 10 million samples, allowing researchers to understand why a specific dataset is effective.
- Extensive Metric Library: Support 80+ scoring dimensions, enabling users to generate comprehensive quality reports with a single click.

- Train–Evaluate–Score Integration
A fully open, reproducible pipeline for model training, benchmark evaluation, and dataset scoring to achieve a truly meaningful comparison.

ODA has already covered 4+ domains, 20+ benchmarks, 80+ scoring dimensions, processed 120+ datasets, evaluated 40M+ samples, and completed over 600+ training runs and 10K+ evaluations — with all metrics continuing to grow.
This repository includes the tools for ODA platform:
- Data Scoring: Assess datasets through diverse metrics and methods, including model-based methods, llm-as-judge, and heuristic methods.
- LLM Model Training: Use LLaMA-Factory to supervised fine-tuning (SFT) the model on the datasets. We provide the SFT scripts for reproducible experiments on mainstream models and benchmarks.
- LLM Benchmark Evaluation: Use OpenCompass to evaluate the performance of the model on popular benchmarks from multiple domains (math, code, science, and general instruction). We also provide the evaluation scripts for the datasets in ODA.
- VLM Model Training: Use LLaMA-Factory to supervised fine-tuning (SFT) the model on the datasets. We provide the SFT scripts for reproducible experiments on mainstream models and benchmarks.
- VLM Benchmark Evaluation: Evaluates the performance of Vision-Language Models on popular benchmarks across multiple domains (Spatial, Reasoning, Infographic, and General) using VLMEvalKit. We also provide evaluation methods for ODA datasets.
First, clone the repository and its submodules:
git clone https://github.com/OpenDataArena/OpenDataArena-Tool.git --recursive
cd OpenDataArena-ToolThen, you can start to use the tools in ODA:
- To score your own dataset, please refer to Data Scoring for more details.
- To train the models on the datasets in ODA, please refer to Model Training for more details.
- To evaluate the LLM models on the language benchmarks in ODA, please refer to LLM Benchmark Evaluation for more details.
- To evaluate the VLM models on the mutlimodal benchmarks in ODA, please refer to VLM Benchmark Evaluation for more details.
We thank to these outstanding researchers and developers for their contributions to OpenDataArena project. Welcome to collaborate and contribute to the project!
This project is licensed under the MIT License - see the LICENSE file for details.
If you find this project useful, please consider citing:
@article{cai2025opendataarena,
title={OpenDataArena: A Fair and Open Arena for Benchmarking Post-Training Dataset Value},
author={Cai, Mengzhang and Gao, Xin and Li, Yu and Lin, Honglin and Liu, Zheng and Pan, Zhuoshi and Pei, Qizhi and Shang, Xiaoran and Sun, Mengyuan and Tang, Zinan and others},
journal={arXiv preprint arXiv:2512.14051},
year={2025}
}
@misc{opendataarena_tool_2025,
author = {OpenDataArena},
title = {{OpenDataArena-Tool}},
year = {2025},
url = {https://github.com/OpenDataArena/OpenDataArena-Tool},
note = {GitHub repository},
howpublished = {\url{https://github.com/OpenDataArena/OpenDataArena-Tool}},
}