Skip to content

OpenDataArena/OpenDataArena-Tool

Repository files navigation

OpenDataArena-Tool



Technical Report stars forks open issues MIT License Documentation Status

English | 简体中文

What's New

Overview

OpenDataArena (ODA) is an open, transparent, and extensible platform designed to transform dataset value assessment from guesswork to science. In the era of large language models (LLMs), data is the critical fuel driving model performance — yet its value has long remained a "black box". ODA aims to make every post-training dataset measurable, comparable, and verifiable, enabling researchers to understand what data truly matters.

ODA introduces an open "data arena" where datasets compete under equal training and evaluation conditions, allowing their contribution to downstream model performance to be measured objectively.

Key features of the platform include:

  1. ODA Leaderboard The core philosophy of ODA is that data value must be verified through real-world training. By establishing a standardized "proving ground," ODA moves beyond subjective quality assessment to empirical performance tracking.
  • Unified Benchmarking: Evaluates post-training data across multiple domains (General, Math, Code, Science, and Long-Chain Reasoning) and multiple modalities (Text, Image).
  • Standardized Environments: Controls for variables by using fixed model scales (Llama3 / Qwen2 / Qwen3 / Qwen3-VL 7-8B) and consistent training configurations.
  1. Data Lineage Analysis Modern datasets often suffer from high redundancy and hidden dependencies. ODA introduces the industry’s first Data Lineage Analysis tool to visualize the "genealogy" of open-source data.
  • Structural Modeling: Maps relationships including inheritance, mixing, and distillation between datasets.
  • Visual Discovery: Provides a "family tree" view to identify core data sources that are repeatedly reused across the community.
  • Contamination Detection: Helps researchers pinpoint potential train-test contamination and "inbreeding" issues, offering a structural explanation for why certain datasets consistently dominate leaderboards.
  1. Multi-dimensional Data Scoring Beyond downstream performance, ODA provides a "physical examination" of the data itself. We offer a fine-grained scoring framework that analyzes the intrinsic properties of data samples.
  • Diverse Methodology: Combines model-based evaluation, LLM-as-a-Judge, and heuristic metrics to assess instruction complexity, response quality, and diversity.
  • Massive Open-Source Insights: We have open-sourced scores for over 10 million samples, allowing researchers to understand why a specific dataset is effective.
  • Extensive Metric Library: Support 80+ scoring dimensions, enabling users to generate comprehensive quality reports with a single click.
  1. Train–Evaluate–Score Integration A fully open, reproducible pipeline for model training, benchmark evaluation, and dataset scoring to achieve a truly meaningful comparison.

ODA has already covered 4+ domains, 20+ benchmarks, 80+ scoring dimensions, processed 120+ datasets, evaluated 40M+ samples, and completed over 600+ training runs and 10K+ evaluations — with all metrics continuing to grow.

OpenDataArena-Tool

This repository includes the tools for ODA platform:

  • Data Scoring: Assess datasets through diverse metrics and methods, including model-based methods, llm-as-judge, and heuristic methods.
  • LLM Model Training: Use LLaMA-Factory to supervised fine-tuning (SFT) the model on the datasets. We provide the SFT scripts for reproducible experiments on mainstream models and benchmarks.
  • LLM Benchmark Evaluation: Use OpenCompass to evaluate the performance of the model on popular benchmarks from multiple domains (math, code, science, and general instruction). We also provide the evaluation scripts for the datasets in ODA.
  • VLM Model Training: Use LLaMA-Factory to supervised fine-tuning (SFT) the model on the datasets. We provide the SFT scripts for reproducible experiments on mainstream models and benchmarks.
  • VLM Benchmark Evaluation: Evaluates the performance of Vision-Language Models on popular benchmarks across multiple domains (Spatial, Reasoning, Infographic, and General) using VLMEvalKit. We also provide evaluation methods for ODA datasets.

Quick Start

First, clone the repository and its submodules:

git clone https://github.com/OpenDataArena/OpenDataArena-Tool.git --recursive
cd OpenDataArena-Tool

Then, you can start to use the tools in ODA:

  • To score your own dataset, please refer to Data Scoring for more details.
  • To train the models on the datasets in ODA, please refer to Model Training for more details.
  • To evaluate the LLM models on the language benchmarks in ODA, please refer to LLM Benchmark Evaluation for more details.
  • To evaluate the VLM models on the mutlimodal benchmarks in ODA, please refer to VLM Benchmark Evaluation for more details.

Contributors

We thank to these outstanding researchers and developers for their contributions to OpenDataArena project. Welcome to collaborate and contribute to the project!

Xiaoyang Wang Qizhi Pei Mengzhang Cai Zinan Tang Yu Li Mengyuan Sun Honglin Lin Xin Gao

Lijun Wu Zhuoshi Pan Chenlin Ming Zhanping Zhong Conghui He

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you find this project useful, please consider citing:

@article{cai2025opendataarena,
  title={OpenDataArena: A Fair and Open Arena for Benchmarking Post-Training Dataset Value},
  author={Cai, Mengzhang and Gao, Xin and Li, Yu and Lin, Honglin and Liu, Zheng and Pan, Zhuoshi and Pei, Qizhi and Shang, Xiaoran and Sun, Mengyuan and Tang, Zinan and others},
  journal={arXiv preprint arXiv:2512.14051},
  year={2025}
}

@misc{opendataarena_tool_2025,
  author       = {OpenDataArena},
  title        = {{OpenDataArena-Tool}},
  year         = {2025},
  url          = {https://github.com/OpenDataArena/OpenDataArena-Tool},
  note         = {GitHub repository},
  howpublished = {\url{https://github.com/OpenDataArena/OpenDataArena-Tool}},
}

About

Tools for OpenDataArena: Fair, Open, and Transparent Arena for Data

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •